Latent property diagnosing procedure

ABSTRACT

The present invention provides a method of doing cognitive diagnosis of mental skills, medical and psychiatric diagnosis of diseases and disorders, and in general the diagnosing of latent properties of a set of objects, usually people, for which multiple pieces of binary (dichotomous) information about the objects are available, for example testing examinees using right/wrong scored test questions. Settings where the present invention can be applied but are not limited to include classrooms at all levels, web-based instruction, corporate in-house training, large scale standardized tests, and medical and psychiatric settings. Uses include but are not limited to individual learner feedback, learner remediation, group level educational assessment, and medical and psychiatric treatment.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention provides a method of doing cognitive, medical andpsychiatric, and diagnosis in general of latent properties of objectsthat are usually people using binary scored probing of the objects.

2. Description of the Prior Art

Part 1: Background Prerequisite to Description of Prior Art

Standardized Testing as Currently Practiced; Cognitive DiagnosisDefined. Before describing the prior art related to the invention, it isnecessary to discuss needed background material. Both large scalestandardized testing and classroom testing typically use test scores torank and/or locate examinees on a single scale. This scale is usuallyinterpreted as ability or achievement in a particular content area suchas algebra or the physics of motion. Indeed, the two almost universallyused approaches to “scoring” standardized tests, namely classical testtheory (Lord, F. and Novick, M. ,1968, Statistical Theories of MentalTest Score, Reading, Mass., Addison Wesley—although an ancient book,still the authority on classical test theory) and “unidimensional” itemresponse theory (IRT), assign each examinee a single test score. An“item” is merely terminology for a test question. The standardized testscore is usually the number correct on the test, but can include in itsdetermination partial credit on some items, or the weighting of someitems more than others. In classroom testing, teachers also typicallyassign a single score to a test.

The result of this single score approach to testing is that the test isonly used either to rank examinees among themselves or, if masterystandards are set, to establish examinee levels of overall mastery ofthe content domain of the test. In particular, it is not used to producea finely grained profile of examinee “cognitive attributes” within asingle content domain. That is, an algebra test can be used to assessJohn's overall algebra skill level relative to others or relative to thestandard for algebra mastery but it cannot determine cognitive attributemastery, such as whether John factors polynomials well, understands therules of exponents, understands the quadratic formula, etc., even thoughsuch fine grained analyses are clearly to be desired by instructor,student, parent, institution, and government agency, alike.

Herein, cognitive diagnosis refers to providing fine-grained profiles ofexaminee cognitive attribute mastery/non-mastery.

Statistical Method or Analysis The cognitive diagnostic algorithm thatforms the core of the invention is a particular statistical method. Astatistical method or analysis combines collected data and anappropriate probability model of the real world setting producing thedata to make inferences (draw conclusions). Such inferences often leadto actual decision-making. For instance, the cognitive diagnosisindicating that Tanya is deficient on her mastery of the quadraticformula can be followed up by providing remediation to improve herunderstanding of the quadratic formula.

To clarify what a statistical method is, an overly simple, non-cognitiveexample is illustrative. As background, it seems worth noting that avaluable aspect of statistical methods is that they explicitly state theinherent error or uncertainty in their inferences. In particular, avalid statistical analysis is careful not to draw inferences that gobeyond what is reasonably certain based on the available information inthe data, accomplishing this by including a measure of the uncertaintyassociated with the inference, such as providing the standard error, afundamental statistical concept. As such, this makes any statisticalmethod for doing cognitive diagnosis superior to any deterministic modelbased method (variously called rule-based, artificial intelligence,data-mining, etc., depending on the particular deterministic approachtaken).

The difference between a deterministic inference and a statisticalinference is illustrated in a simple setting. A coin is claimed to beloaded in favor of coming up heads. It is tossed 10 times and produces 7heads. The non-statistical, deterministic approach with its inherentfailure to address possible inference error or uncertainty simplyreports that the inferred probability p of heads is 0.7 and henceconcludes that the claim is true. The statistical approach reports thateven though the most likely probability p of heads is indeed 0.7,nonetheless, because of the uncertainty of this inference due to thevery limited amount of data available, all that can really beconfidently predicted is that 0.348≦p≦0.933. Thus from the statisticalinference perspective, there is not strong evidence that the coin isunfair. This statistical perspective of appropriate caution is thesuperior way to proceed.

Similarly, cognitive diagnoses using the Unified Model (UM) discussedhereafter will only assign attribute mastery or attribute non-mastery toan examinee for a particular attribute when the examinee test dataprovides strong evidence supporting the particular conclusion drawn,like Jack's mastery of the algebraic rules of exponents.

Now a non-cognitive example of a statistical method in more detail thanthe illustration above is given.

Example 1

A drug with unknown cure probability p (a number between 0 and 1) isadministered to 40 ill patients. The result is that 30 are cured. Thestandard binomial probability model is assumed (that is, it is assumedthe patients respond independently from one another and there is thesame probability of cure for each patient). Based on this model and thedata, it is statistically inferred from the mathematical properties ofthe binomial probability model that the actual cure rate is p=0.75 withconfidence that the error in this estimate is less than ±0.14. Thus, theinference to be drawn, based on this limited amount of data, is that plies in the interval (0.60,0.89). By contrast, if there were 400patients in the drug trial (much more data, that is) with 300 curesoccurring, then it would be inferred p=0.75 as before, but now with muchmore precise confidence that the estimation error is less than ±0.04.More data provides more confidence that the inherent uncertainty in theinference is small.

Educational Measurement, Item Response Theory (IRT), and the Need forEducational Measurement VIRT-based Cognitive Diagnostic Models. Thecurrent paradigm that dominates probability modeling of educational testdata is item response theory (Embretson, S. and Reise, S. (2000) ItemResponse Theory for Psychologists. Mahwah, N.J., Lawrence Erlbaum). Thisassigns a probability of getting an item right to be a function of asingle postulated latent (unobservable) ability variable, alwaysinterpreted as a relatively broad and coarse-grained ability likealgebra ability. Different examinees are postulated to possess differentlevels of this latent ability. Since the higher the level the greaterthe probability of getting the item right, it is justified to call thislatent variable “ability”. FIG. 1 shows the standard logistic itemresponse function (IRF) of an item as a function of ability θ. Each suchfunction provides P(θ)=probability of getting an item right for atypical examinee of ability θ.

Typically, as herein, the scale for examinee ability is such thatability less than −2 indicates very low ability examinees (the lowest2.5%), 0 indicates an average ability examinee and above 2 indicatesvery high ability examinees (the highest 2.5%). IRT based statisticalmethods are currently heavily used in educational measurement tostatistically assess (infer from test data and the IRT model) examineelatent ability levels.

Educational measurement is the applied statistical science that usesprobability models and statistical methods to analyze educational data(often test data) to provide information about learning processes andabout various educational settings and to evaluate individual level andgroup level (state, school district, nation, etc.) intellectualperformance.

A modern development receiving major emphasis in educational measurementis the attempt to develop new measurement models of test settings thatallow one through statistical analysis of test data to cognitivelydiagnose examinees. Cognitive diagnosis, as already indicated, refers toa relatively fine-grained analysis that evaluates examinees in terms ofwhich specific skills (generically called “attributes”) in a generalsubject area each examinee possesses or lacks (see Frederiksen, N.,Glaser, R., Lesgold, A., and Schaflo, M., 1990, Diagnostic Monitoring ofSkill and Knowledge Acquisition. Mahwah, N.J., Lawrence Erlbaum; andNichols, P., Chipman, S., & Brennan, R., Cognitively DiagnosticAssessment, 1995, Erlbaum, Hillsdale, N.J. for edited sets of articlesdedicated to modem cognitive diagnosis). These two examinee states arereferred to as mastery (possessing the attribute) and non-mastery(lacking the attribute). Take algebra for example, and recall thepartial list of algebra attributes given above: factoring, quadraticformula, etc. Rather than just using an examinee's test performance toassign an algebra score, cognitive diagnosis focuses on assessing anexaminee with respect to these individual algebra attributes. Forexample, based on the test performance, an examinee might be judged tohave “mastered” the quadratic formula but to have not masteredfactoring. Such cognitive diagnostic capabilities are obviously of greatpractical importance both for standardized testing and testing used ininstructional settings, such as those occurring in the classroom orusing learning-at-a-distance WEB based courseware.

Example 2

A need for cognitive diagnosis. One of the inventors, an instructor of acollege level introductory statistics course, gave an exam on the firstthree chapters of the text. The items were constructed to represent thedistinct concepts taught in the three chapters. It was deserved toevaluate the students by more than their score on the exam; specificallyhow well they understand the concepts that were taught. After the testwas constructed, a list of the eight concepts, or attributes, wascompiled: (1) histogram, (2) median/quartile, (3) average/mean, (4)standard deviation, (5) regression prediction, (6) correlation, (7)regression line, and (8) regression fit. As expected, some itemsinvolved more than one attribute per item. On the forty-item exam, eachattribute appeared in an average of six items. Evaluating the test on anattribute level instead of using the total score would help in thenecessary determination of areas for which review by the student wasnecessary; and it would help the each student identify what he/sheshould study. This example is developed into a simulated example of thepresent invention in the Description of the Preferred Embodimentssection hereafter.

In spite of its clear potential value to society, cognitive diagnosis, adifficult area of application, has been slow getting off the ground.Mathematical models developed by cognitive scientists/psychologists andcomputer scientists for scholarly purposes are designed with a differentpurpose than cognitive diagnosis in mind, namely to understand in detailhow mental cognitive processing occurs, and often also how it evolvesover time (learning). As such, these models are inherently ill-suitedfor cognitively diagnostic purposes. They are both deterministic andparametrically very complex, and for both reasons they tend to performpoorly when they are used to do cognitive diagnosis in typical testsettings using simply scored items, where the amount of data is limitedand the data are clearly subject to random variation. Just because anexaminee is judged to have mastered the major relevant attributes neededto answer an item correctly, it does not follow that the examinee willindeed get the item right. Similarly, the lack of mastery of onerequired major relevant attribute does not guarantee that an examineewill get the item wrong.

Positivity Introduced A lack of consistency with what is predicted bythe deterministic cognitive model is what is called positivity. It issimply the aspect of a measurement model that admits a probabilisticstructure linking attribute mastery and correct use of the masteredattribute in solving an item. For example Towanda may be judged a masterof the rules of exponents but may apply her understanding of exponentsto an item incorrectly because the needed competency concerning therules of exponents is exceptionally high for the item Towanda is tryingto solve and in fact is higher than that possessed by Towanda, eventhough she is a master of the attribute rules of exponents.

Overfitting the Data: a Fatal Flaw in Doing Inference UsingDeterministic Models It has already been discussed that deterministicmodels can go beyond the available information in the data by ignoringthe inherent uncertainty in the data and thereby “over-predicting”. Inparticular, such deterministic “data-mining” models, as lampooned in thecomic strip Dilbert recently, because of their tendency to over-predict,can tend to find seemingly systematic and thus reportable patterns inthe data that simply are just accidents of random noise and thus don'trepresent anything real. In particular, predictions based on them oftendo not hold up in new analogous data sets and thus are unreliable anddangerous. Statisticians call this phenomenon of looking at random noiseand inferring a systematic “signal”, or pattern in the data,over-fitting the data. Such over-fitting is a direct consequence of notincluding information about the level of uncertainty in the inferenceprocess involved.

A variation of the simple coin tossing illustration discussed earliermay help illustrate the over-fitting issue. If a possibly unfair coin istossed four times and comes up as four heads, the most simplisticover-fitted deterministic approach might conclude that the coin willalways comes up heads, thus predicting that the pattern to be expectedfor new coin tossing will be to always get heads. Whereas, theprobabilistic statistical approach merely concludes that all that can beinferred is that the unknown probability of heads lies in the interval(0.4,1). From this appropriately cautious perspective, it is thus quitepossible the coin is actually fair!

The UM, upon which the present invention is in part based, isstatistical and hence, as is crucial, avoids over-fitting of the data bypredicting attribute masteries and non-masteries for examinees only whenthere is strong evidence to support such predictions.

The widely used probabilistic “unidimensional” IRT models, whiletractable both mathematically and statistically and hence able to copeappropriately with random examinee variation by their probabilisticnature (in particular, not over-fitting the data), are unfortunately tooparametrically simplistic to be used as vehicles to theoreticallyunderpin fine-grained cognitive diagnoses. That is, these models dealwith ability at the coarse-grained ability level (e.g., ability inintroductory statistics) and as such are incapable of dealing at thefine-grained cognitive attribute ability level (e.g., mastery or not ofinterpreting histograms, calculating means, etc.).

There is a new and promising effort to marry the deterministic cognitivescience tradition and the probabilistic measurement/IRT tradition toproduce tractable and realistic probabilistic cognitive diagnosticmodels that function at the cognitive attribute level.

These new models are far more complex than the standard IRT models.However, they are far less complex than the typical deterministiccognitive science models discussed above. In particular they avoidoverfitting the data. The UM is one of these new complex probabilisticmodels.

Part 2. Description of Prior Art

Probably the first cognitively oriented measurement model to function inthe IRT tradition is Gerhardt Fischer's linear logistic model (Fischer,G (1973) Linear logistic test model as an instrument in educationalresearch. Acta Psychologica, 37, 359-374). This is of historicalinterest only because it cannot by its nature actually do cognitivediagnosis of examinee test data. By now however, there are severalimportant IRT-based models that focus on the cognitive modeling of testresponses, each of which constitutes prior art. In particular, thestatistical models of Kikumi Tatsuoka, Robert Mislevy, Susan Embretson,and Brian Junker as detailed below, are the relevant examples from theprior art perspective. Further, an early, primitive, incomplete, andunusable version of the UM, utilized by the present invention, appearedin DiBello, L, Stout, W, and Roussos, L, 1995, Unified CognitivePsychometric Assessment Likelihood-Based Classification Techniques. InNichols, et al. Cognitively Diagnostic Assessment. Mahway, N.J.,Lawrence Erlbaum, and is central from both the prior art perspective andin enabling one to understand the current UM. The non-probabilistic(deterministic) cognitive models are numerous and highly specialized.They are so distinct from the UM in approach and so ill-suited forpractical cognitive diagnoses.

The Prior Art UM Procedure Proposed in DiBello et al. The 1995 versionof the UM is the most relevant instance of prior art.

The flow chart of FIG. 2 illustrates the UM Cognitive Diagnostic (UMCD)procedure as proposed in DiBello et al. Some of its elements are commonto the current UMCD algorithm of the present invention. The presentinvention uses innovations and modifications of the proposed UM approachof DiBello et al. As background, it assumed i=1, 2, . . . , n items, andj=1, 2, . . . , N examinees; and k=1, 2, . . . , K attributes. Theresult of administering the test is the examinee responses data matrixX = {X_(ij)}.

Here X is random, reflecting the fact that a test administration ismodeled as a random sampling of examinees who then respond randomly to aset of test items. Then X=x is the result of carrying out an actual testadministration and producing observed data x (Block 207). Thus x is an nby N matrix of 0s (0 denoting an incorrect response for an item/examineecombination) and 1s (1 denoting a correct response for the item/examineecombination). The jth column represents the responses to the n testitems for a particular examinee j. For example if two examinees took athree item test, then x might be $\quad\begin{matrix}1 & 0 \\1 & 1 \\0 & 0\end{matrix}$indicating that the first examinee got the first two items right and thesecond examinee got only the second item right.

It should be noted that a parameter of a scientific model in general andof a probability model in particular is an unknown quantity in the modelthat must be statistically determined from data for each particularapplication of the model, with the value of this parameter varying fromapplication to application. The parameters of the n item, N examinee UM,generically denoted by ω are given byω=(α, θ; r, π, c)where (α, θ) are the examinee parameters and (r, π, c) are the itemparameters, the latter sometimes referred to as the test structure.Often examinee parameters will be subscripted by j to indicate they arethose of Examinee j, and item parameters will be subscripted by i orboth i and k to indicate that they belong to Item i and possiblyspecific to Attribute k. Each of the parameters of ω are carefullyexplained below. The flow chart in FIG. 2 diagrams in principle (suchdiagnoses were not possible for the 1995 UM of DiBello et al) the mainstages of how one would use the UM of DiBello et al to carry out acognitive diagnosis. In fact, statistical cognitive diagnosticprocedures typically have much in common with FIG. 2, with one essentialdifference usually being in how the probability model f(X|ω) is built.

Basic concepts of the UM presented in DiBello et al are explained byreferring often to FIGS. 2 and 3. As an illustration of the typicaldimensions of a cognitive diagnostic setting, in our diagnosticapplication to the classroom statistics test, there were N=500examinees, viewed as the approximate number of students taking anintroductory statistics course in a large university. This example isdeveloped into a simulation example demonstrating cognitive diagnosticeffectiveness of the present invention discussed below in theDescription of the Preferred Embodiments section. The examination hadn=40 items, testing the statistical content from the first threechapters of the textbook used in the course. It is assumed thatdifferent items require different combinations of the K attributes. Inour example, K=8, the number of major concepts tested on the statisticsexam.

Recall that an “attribute” is a general term for any bundle of knowledgethat can be judged as mastered or not mastered. The selected attributes(Block 201 of FIG. 2) to be used to build the item/attribute incidencematrix (Block 205 of FIG. 2) are defined by the user of the algorithmand can be anything the user wishes. Indeed the freedom of the user tochoose attributes unconstrained by any particular cognitive theory oflearning and/or mental processing is a real strength of the UM. That is,unlike many other approaches to cognitive diagnosis that embrace andhence depend on understanding and accepting a particular theory ofcognitive mental processing, the UM allows the user to select anyattributes based on any conceptualization of learning, mentalfunctioning, or cognition, even a highly informal structure that wouldbe accessible to an instructor of a typical classroom course. Each ofthe N examinees has K attributes and hence the α component of ω is amatrix of dimension N by K. Here each row of a corresponds to a singleexaminee and has K elements (0's and 1's). An 0 indicates examineenonmastery and a 1 indicates examinee mastery.

The purpose of a UM model based cognitive diagnosis is to use theavailable test data x that results from administering the test (Block207 of FIG. 2) to infer (Block 213 of FIG. 2) for each examinee which ofthe K attributes there is strong evidence that she has mastered andwhich there is strong evidence that she has not mastered (noting thatfor each examinee there will likely be certain attributes for whichthere is not strong evidence of either mastery or non-mastery).

The required input data to initiate the proposed UM algorithm consistsof two data files that are relatively easy to understand and producewithout the user needing a sophisticated understanding of cognitivescience, this an advantage of the UMCD relative to other prior art.First, for every item, a list of the attributes required to besimultaneously mastered in order to correctly solve the item is selected(Block 201 of FIG. 2). Often, the user/practitioner first decides whichattributes to cognitively diagnose in the particular educational settingand then constructs the needed test items (Block 203 of FIG. 2).Sometimes the user constructs the test items first and then selects theattributes to be diagnosed.

Then the user decides for each item which of these attributes arerequired, thus producing the n by K item/attribute incidence matrix(Block 205 of FIG. 2). An example of an item/attribute incidence matrixfor the statistics test diagnostic example is given in FIG. 18 describedin the Description of the Preferred Embodiments section.

It is emphasized that the user of a UM-based diagnostic algorithm, suchas a school district curriculum specialist or college instructor,typically carries out the activities in Blocks 201, 203, and 205 of FIG.2, namely selecting attributes, constructing test items, and buildingthe item/attribute incidence matrix. In particular, the user typicallychooses the relevant attributes and designs the questions to measurethese attributes (in either order), and then decides which of the chosenattributes are required for the correct solution of each item. Thisrelatively easy user activity may be assisted by consultants withpersonal knowledge of UMCD or by referencing a UMCD tutorial presentingthe basic principles of good cognitive diagnosis item manufacture,attribute definition, and incidence matrix construction for use with theUMCD program.

As an example of an item/attribute incidence matrix, consider threeitems and four attributes. Then the incidence matrix${Items}\quad\overset{Attributes}{\begin{matrix}0 & 1 & 1 & 0 \\1 & 0 & 0 & 0 \\0 & 0 & 1 & 1\end{matrix}}$defines that Item 1 requires Attributes 2 and 3, Item 2 requiresAttribute 1, and Item 3 requires Attributes 3 and 4.

Second, based on the administering of the test to the examinees, theexaminee response data consists of a record for each examinee of whichitems were answered correctly and which items were incorrectly answered.Notationally, this is expressed as follows:

-   -   X_(ij)=0 if Examinee j answered Item i incorrectly    -   1 if Examinee j answered Item i correctly

For example consider the test responses of two examinees responding tofour items.

-   -   Examinee 1 responses: 0 0 1 1    -   Examinee 2 responses: 1 0 0 1

This shows Examinee 1 got Items 3 and 4 right, and Examinee 2 got Items1 and 4 right. As already indicated, all of these x_(ij) responses arecollected together to form the matrix of responses test data examineeresponses x.

Recall that for each examinee a denotes the (unknown) latent vector oflength K indicating for each of the K attributes examinee mastery(denoted by a 1) and examinee nonmastery (denoted by a 0). For exampleα _(j) =(1,0,1,1,0)means that Examinee j has mastered attributes 1, 3, and 4 and has notmastered attributes 2 or 5. Inferring what α is for each examinee is thegoal of cognitive diagnosis.

Block 209 of FIG. 2, which occurs after building the incidence matrix(Block 205 of FIG. 2). consists of building the probability modelf(X|ω), recalling that ω=(α, θ, r, π, c) denotes the item and examineeparameters of the n item by N examinee model. To understand this block,which is the heart of the UM, certain technical concepts must beintroduced. Referring to the schematic of the UM probability model givenin FIG. 3 for one item/examinee response X_(ij) is especially usefulhere.

The Basic Equations of the DiBello et al UM as Partially Indicated bythe FIG. 3 The UM uses the notion of an item response function (IRF), asdo all IRT-based models. An IRF is an increasing S-shaped curve boundedby 0 below and 1 above. In the usual IRT model setting this provides theprobability of getting an item correct as a function of a continuouslatent ability such as statistics ability, traditionally denoted by 0.Graphically, such an IRF is represented in FIG. 1. The notation P(θ)refers to the probability of getting the item correct for an examinee oflatent ability θ. The formulas for the UM depend on using the FIG. 1IRF.

The basic building block of the UM (Block 209 of FIG. 2) is to developan expression for the probability of a correct response to Item i byExaminee j where the examinee possesses a latent residual ability θ_(j)and a latent attribute vector α _(j)=(α_(j1), . . . , α_(jk)), whereeach component α_(jk) equals 0 or 1 for each of the K attributesaccording as attribute k is not mastered or is mastered. The probabilitymodel for one examinee responding to one item is given next.Prob(X _(ij)=1|ω)=S_(ij) ×P(θ_(j) +c _(i)),   (1)where the IRF is given in FIG. 1 and S_(ij) is explained below. Here,“|ω” simply means that the probability that X_(ij)=1 is computed whenthe parameter values are equal to ω. A schematic representing theparametric influences producing the basic equation (1) is given in FIG.3. Because the only possible values for X_(ij) are 1 and 0, elementaryprobabilistic logic yieldsProb(X _(ij)=0|ω)=1−Prob(X _(ij)=1|ω)

Moreover, in IRT, examinees are modeled to respond independently of eachother. Also by the basic IRT modeling principle of local independence,responses to different items for a collection of examinees all havingthe same set of values of the examinee parameters (α, θ) are modeled tobe independent of each other. In probability models, the probability ofa set of independent events all happening simultaneously is gotten bymultiplying the probabilities of the individual events together. Thusthe single item and examinee model of Equation 1 becomes for the set ofall N examinees and n itemsf(x|ω)=Prob (X=x|ω)=ΠΠProb(X _(ij) =x _(ij)|ω)   (2)

Here the symbol ΠΠ indicates taking the product over the range of i andj, namely over the outer product as j ranges from 1 to N and over theinner product as i ranges from 1 to n. For emphasis, note that it is theindependence of X_(ij) responses for different examinees and fordifferent items that allows the double product in the basic UM IRT modelgiven by Equation 2. Further x_(ij) denotes the i, j^(th) member of xand is either a 1 or a 0 according as Item i is answered correctly orincorrectly by Examinee j.

The Core UM Concepts of Positivity and Completeness In order tounderstand Equations 1 and 2, which comprise the very essence of the UM,it is noted that the UM postulates two cognitive concepts of fundamentalimportance and considerable usefulness, namely positivity andcompleteness. The first factor, S_(ij), of Equation 1 models positivityand the second factor P(θ_(j)+c_(i)) models completeness.

Indeed, the introduction of completeness, which is modeled by thecontinuous (or just as easily, can use a many valued discrete variableθ) latent variable θ in the second factor, is unique to the UM amongcognitive diagnostic models. Further, the combining of the twofundamental concepts of completeness and positivity in the UM, asreflected in the multiplication of the two factors in equation 1 alsodistinguishes the UM from all other IRT-based cognitive diagnosticmodels. Equations 1 and 2 are now explained.

Completeness First the second factor P(θ_(j)+c_(i)) of Equation 1 isconsidered, which models the degree of completeness for Item i and theprescribed attributes of the UM. The parameter c_(i), which varies fromitem to item, is the completeness parameter. When developing the UMequations, one core aspect of the UM is that in order to keep the numberof parameters per item to a reasonable and hence statistically tractablenumber relative to the size of the available data set, intentionallytrying to explicitly model the role of many minor yet influential latentattributes is omitted. An influential attribute means that attributemastery versus non-mastery changes the probability of answering the itemcorrectly. When these influential but minor attributes are omitted,c_(i) quantifies the relative combined influence of these omittedattributes as compared with the combined influence of the explicitlymodeled attributes α upon examinee responding to Item i.

To be precise, suppose that the accurate and complete (in the sense ofincluding all the attributes that in fact influence examinee itemperformance) cognitive diagnostic model for the university statisticsexamination discussed above (such as a cognitive scientist moreinterested in basic science than doing practical cognitive diagnosticsmight produce after conducting an intensive and detailed cognitivepsychological study of a few of the students in the college introductorystatistics) includes 200 attributes. Suppose that for the sake ofstatistical analysis tractability with the limited amount of examineedata available and the fact that the test has only 40 items the model isrestricted to explicitly having 8 attributes in the UM's incidencematrix. Thus 8 attributes are selected which are believed to beimportant in determining examinee test performance, including all theattributes the instructor wishes to cognitively diagnose. Then the roleof θ_(j)+c_(i) is to parsimoniously encode the influence of the missing192 less important and omitted attributes for Examinee j and Item i. Forclarity note that in practice one has little idea how many or what theexcluded minor attributes are. That is, the user does not need to havefigured out what all the minor attributes are in a test situation inorder to build a UM, this a big advantage over traditional cognitivemodeling.

It should be noted that the residual ability θ_(j) functions as thecombined Examinee j attribute-based ability on the 192 excludedattributes. This modeling technique of allowing θ to parsimoniously“soak up” the influence of the 192 minor attributes is one of the majorreasons the UMCD approach is superior to other IRT-based cognitivediagnostic approaches.

Then, the role of c_(i) for an item is to proportion out the relativeimportance of the major included attributes α _(j) versus the excludedminor but still influential attributes as built into the UM throughθ_(j) in determining examinee item performance.

Assume, as is standard in IRT modeling, that θ is a standard normalrandom variable (the well-known “bell-shaped” curve), as shown in FIG.4.

Note by FIG. 4 that about ⅔ of all examinee abilities are between −1 to+1, while virtually all are between −3 and +3. Thus, for example, a θ=0examinee has average overall ability on the θ composite representing the192 excluded attributes while a θ=2 examinee is of very high ability onthe excluded attributes.

The degree of completeness of an item (i say) is quantified by c_(i) inthe following manner. For some items, c_(i) will be large (for examplec_(i)=2.5), indicating P(θ+c_(i))≈1 for most examinees (as seen byinspecting the IRF of FIG. 1 where P(θ+c_(i)) =I clearly holds unless anexaminee's θ is unusually small), and hence completeness holds andexaminee performance on those items is largely determined by thepositivity factor S_(ij) that explicitly models the influenceprobabilistically of the UM-model-included attributes a. That is,examinee performance is primarily determined by the important attributes(those explicitly chosen by the user) that make up α. In this case themajor explicitly modeled attributes are relatively complete for theitems in question.

Similarly, for other items c_(i) will be small (for example c_(i)=0.5),indicating P(θ+c_(i))<1 (substantially) for most examinees. Thus, asexpressed by the value of P(θ+c_(i)), the role of the excludedattributes modeled by the residual ability θ is quite important ininfluencing examinee responding as well as the included major attributesalso being quite important. In this case the included modeled attributesare relatively incomplete for the item in question.

Because this is rather abstract and yet is key to the understanding ofthe completeness concept, a simple example is given. Consider anexaminee of average ability θ=0. Suppose that c_(i)=3, indicating a verycomplete item for which examinee response behavior is is controlledalmost entirely by the included attributes. Then note, referring to FIG.1, that the examinee's chances of correctly applying the excluded minorattributes correctly to the item is given by P(θ+c_(i))=P(3)≈1. Thus themodel, appropriately, lets examinee mastery/non-mastery of the majorattributes effectively be the sole determinant of correct examineeperformance on the item, as expressed by S_(ij) of Equation 2.

Positivity The second cognitive concept of fundamental importance in theUM is positivity, which is made explicit in Equation 3 below for S_(ij).This gives the probability that the model's listed attributes that arein particular required for Item i according to the incidence matrix(Block 205 of FIG. 2) are applied correctly to the solution of Item i(which requires certain attributes to be mastered) by Examinee j (whohas mastered certain attributes)S _(ij)=[(π_(i1))^(α) ^(j1) ×(π_(i2))^(α) ^(j2) × . . . ×(π_(im))^(α)^(jm) [(r _(i1))^(1·α) ^(j1) ×(r _(i2))^(1−α) ^(j2) . . . ×(r_(im))^(1−α) ^(jm) ]  (3)

Note that when an α=1 only its corresponding π is a factor in S_(ij)(not its corresponding r) and when an α=0, only its corresponding r is afactor in S_(ij) (not its corresponding π). Thus S_(ij) is the productof m factors, each a π or an r. Here it is to be understood that the mattributes of the above formula are the attributes specified as requiredby Item i in the item/attribute incidence matrix. Also, ^(α) _(j2)=1 or0 denotes the mastery or nonmastery state respectively of Examinee j onAttribute 2, etc.

Recalling Equations 1,2, and 3, it is seen that the item/attributeincidence matrix is needed input into determining f(X|ω) as the arrowconnecting Block 205 to 209 in FIG. 2 indicates. This is because theitem/attribute incidence matrix provides for each Item i which mattributes are needed for its solution. In particular the π's and r'sappearing in Equation 3 correspond only to the attributes that arerequired for Item i.

Definition of the Positivity Parameters π's and r's of Equation 3 Ther's and π's as follows:

-   -   r_(ik)=Prob(Attribute k applied correctly to Item i given that        the examinee has not mastered Attribute k.)

Similarly,

-   -   π_(ik)=Prob (Attribute k applied correctly to Item i given that        the examinee has mastered Attribute k).

Interpretation of High Positivity. It is very desirable that itemsdisplay high positivity. High positivity holds for an item when its r'sare reasonably close to 0 and its π's are reasonably close to 1. Thatis, with high probability an examinee applies the attributes requiredfor the item according to the item/attribute incidence matrix correctlyif and only if the examinee has mastered these attributes. For example,when high positivity holds, an examinee lacking at least one of therequired attributes for the item is very likely to get the item wrong.Consider an item requiring Attributes 1 and 3 from the statistics testexample. If, for instance, an examinee does not know how to interpret ahistogram (Attribute 1), but the item requires correct calculation ofthe average by interpreting a histogram, the examinee is likely to getthe item wrong, even if she has mastered averages (Attribute 3).Conversely, an examinee who has mastered both the required attributes islikely to get the item right, provided also that the θ+c is largeindicating that the examinee likely will either use the (possibly many)required attributes needed for the item but excluded from the modelcorrectly as well (i.e., the examinee's θ is large) or that the excludedattributes will play only a minor role (i.e., the item's c is large).Thus, if the calculation of the mean from the histogram isstraightforward, for instance if the histogram is fairly standard andthe calculation of the mean is uncomplicated, then an examinee who hasmastered both calculation of averages (Attribute 3) and histograms(Attribute 1) will be likely to get the item right because the influenceof attributes excluded from the model is minor and hence c will belarge. In summary, a highly positive and reasonably complete item willbe very informative about whether the examinee possesses all of itsrequired attributes versus lacking at least one of them. This completesthe description of the basic parameters of the probability model portionof the UM, that is of Block 209 of FIG. 2 and of FIG. 3. For furtherdetails concerning positivity, completeness, and the role of θ, consultDiBello et al.

One of the most important and useful aspects of the UM, as contrastedwith other IRT-based models, is that completeness and positivity providea natural and parsimonious way to parameterize the random nature ofcognitive examinee responding. It is relatively easy to explain to theuser of the UM procedure what these two concepts mean and, moreover, toexplain the details of how they are parameterized in the UM.

Inability to Calibrate the DiBello et al UM Blocks 211, 213, 215 couldnot be carried out in the 1995 DiBello et al paper because in particularBlock 211 could not be carried out. This then precluded the twosubsequent Blocks 213 and 215 from being able to be carried out. Thefailure to carry out Block 211 was because as of 1995 there were toomany parameters in the UM equations compared with the size of a typicaltest data set in order to achieve acceptable UM model parametercalibration (recall that calibration simply means estimation of theparameters of the model using the available data). In particular it wasimpossible to estimate examinee attribute mastery versus nonmastery(Block 213) and then to use this to do cognitive diagnoses (Block 215)such as informing an examinee of which attributes need further study.

The above Described UM as Used in the Preferred Embodiment UMCD of thePresent Invention (discussion below presented only for the cognitivediagnosis application; results are identical for the medical orpsychiatric application). The construction of a test comprising testitems and the selection of a set of attributes (Blocks 201 and 203)designed to measure examinee proficiency is common to the 1995 UMprocedure and the UMCD of the present invention. The building of theitem/attribute incidence matrix (Block 205) is common to the 1995 UMprocedure and the UMCD of the present invention. The completenesscomponent P (θ_(j)+c_(i)) is common to the 1995 UM procedure and theUMCD of the present invention. That is, the selected attributes formingthe incidence matrix being a subset of a larger group of attributesinfluencing examinee test item performance with the remainder of thelarger group of attributes being accounted for in the UM by aresidential ability parameter, namely completeness, is common to the1995 UM procedure and the UMCD of the present invention. Positivity,namely the model including parameters describing how the test itemsdepend on the selected set of attributes by accounting for a probabilitythat each examinee for each individual test item may achieve mastery ofall the attributes from the subset of the selected set of attributesrequired for the individual test item but fail to apply at least onesuch required and mastered attribute correctly to the individual testitem thereby responding to the test item incorrectly is common to the1995 UM procedure and the UMCD of the present invention. Similarly, andalso part of the definition of positivity, each examinee for eachindividual test item may have failed to achieve mastery of at least onespecified attribute required for the item and nevertheless apply theserequired specified attributes for which mastery was not achievedcorrectly to the item and also apply the remaining required and masteredattributes from the selected set of attributes correctly to the itemthereby responding to the test item incorrectly. But the 1995 UM itemparameters were not identifiable whereas the parameters of the UM of thepresent invention are. Also in common is the administering of the test(Block 207)

Other Prior Art: Probability Model-based Cognitive DiagnosticProcedures; Deterministic Procedures

1. Probability model-based procedures Most of the important IRT-based(and hence probability model-based) cognitive diagnosis procedures use aBayesian formulation of a cognitive model and sometimes use acomputational tool called Markov Chain Monte Carlo (MCMC) as thecomputational tool to calibrate them. The UM procedure presented, whichforms a core of the present invention also has a Bayes probability modelformulation and also uses MCMC. Thus Bayes modeling, MCMC computation,and needed related concepts are first explained before furtherpresenting the other instances of prior art.

Calibration of a Statistical Model Consider the simple model y=ax, wherea is an unknown parameter. Model calibration refers to the use of theavailable data, which is viewed as generated by the model, tostatistically estimate the unknown model parameters. It must beunderstood that without calibration, probability models are useless fordoing real world inference such as cognitive diagnosis. In particular,model calibration is necessary because the parametric model y=ax is ofno use in carrying out the desired statistical inference of predicting yfrom x until the parameter a is calibrated (estimated) from real data.Thus if a=2 is an accurate estimate provided by the data, the nowcalibrated model y=2x is useful in predicting y from x, provided thissimple straight line model does a relatively accurate (unbiased) job ofdescribing the real world setting of interest.

The need for a new statistical modeling approach in complex, real-worldsettings A major practical problem often standing in the way ofapplicable statistical modeling of complex real world settings is thatmodeling realism demands correspondingly complex and hencemany-parameter models while the amount of data available is often notsufficient to support reliable statistical inference based on suchcomplex models with their many parameters. (The more parameters in amodel, the greater the statistical uncertainty there is in the estimated(that is, calibrated) values of these parameters. Thus 400 data pointsproduced little uncertainty in the estimation of the drug cureprobability p in the one-parameter model of Example 1. But if there wereinstead 30 parameters in a problem where the number of data points is400, then the level of uncertainty in the parameter estimates needed tocalibrate the model will likely render the model almost useless for thedesired statistical inference.

In many complex settings where appropriate statistical modeling andanalysis is needed, an unacceptable dilemma exists. On the one hand, thelimited data available can be used to well calibrate a biased model (anoverly simplified model that distorts reality) because there is ampledata to accurately estimate its relatively few parameters. For exampleestimating a=2 in the model y=ax is of no use for prediction if thereality is well described only by the more parametrically complexfour-parameter model y=c+ax+bx²+dx³. On the other hand, suppose onlypoor calibration of the four parameters of this model that accuratelyportrays reality because there is not enough data to well estimate thefour model parameters. To illustrate, if the true reality isy=5+9x+12x²+4x³, the model is poorly calibrated to be y=3+4x+20x²+2x³using the limited available data, the calibration is so bad that thecalibrated cubic polynomial model will be of no use for doing accurateprediction of y from x even though the reality is well described by acubic polynomial model.

This dilemma of bad model with good calibration versus good model withbad calibration is a particular instance of what statisticians sometimescall the variance/bias tradeoff. Under either unacceptable modelingcompromise, valid (i.e., using a relatively unbiased and relatively wellcalibrated model) inferences about the real world setting of interestare simply not possible.

Bayes Probability Modeling (a Practical Answer to Modeling Complex RealWorld Settings that Require Many Parameters) as a Major StatisticalModeling Technique Fortunately, recent developments in statistics offera solution to the challenging dilemma of probability modeling of complexsettings requiring relatively large numbers of parameters in theirmodels. In particular, these developments apply to the probabilitymodeling of the inherently complex cognitive diagnosis setting. Once apractitioner recasts parametrically complex statistical models as Bayesmodels, because of their newly acquired Bayesian nature they can be aswell calibrated as if they have relatively few parameters and yet canaccurately model complex settings. More specifically, this Bayesianmodeling approach often allows circumventing the problem of a modelbeing too parametrically complex to be reliably calibrated usingavailable data. Indeed, in one of the major sources on Bayesianstatistical analysis, Gelman, Carlin, Stern, and Rubin dramaticallystate (Gelman, A, Carlin, J, Stem, H, and Rubin, D., 1995, Bayesian DataAnalysis. London, Chapman and Hall), “As we show in this chapter, it isoften sensible to fit hierarchical (Bayesian) models with moreparameters than there are data points”. In particular, hierarchicalBayes modeling can be applied in IRT modeling of complex settingsproducing test data. An important paper by Richard Patz and BrianJunker, 1999,

A Straightforward Approach to Markov Chain Monte Carlo Methods for ItemResponse Models, Journal of Educational and Behavorial Statistics, 24,146-178) effectively makes the case for the use of Bayes approaches whendoing complex IRT modeling. More precisely, using a Bayesian modelframework combined with straightforward MCMC computations to carry outthe necessary Bayes calculations is highly effective for analyzing testdata when complex IRT models are needed (Patz et al). This is preciselythe invention's situation when trying to use test data to carry outcognitive diagnoses. Further, as suggested above, the UM incorporatingan expertly crafted Bayesian approach has the potential to allow thefull information locked in the test data to be extracted for cognitivediagnostic purposes.

Bayes Modeling Example Although the notion of a Bayes probability modelis a complex and sophisticated concept, a simple example will clarifythe basic idea of what a Bayes probability model is and how itsstatistical analysis proceeds.

Example 3

Example 1 (modified). Consider the drug trial setting of Example 1.Suppose that in addition to the data there is powerful prior scientificevidence that the true unknown p satisfies 0.5≦p≦0.9 and, moreover,values of p in this range become more improbable the further they areaway from a cure rate of 0.7. The Bayes approach quantifies suchprobabilistic knowledge possessed by the investigator about thelikelihood of various values of the parameters of the model by assigninga prior probability distribution to the parameter p. That is, a Bayesmodel puts a probability distribution on the model's parameter(s), wherethis distribution reflects how likely the user believes (based on priorknowledge and/or previous experience) various values of the unknownparameter are likely to be. Suppose the prior distribution for p isgiven as a “density” in the FIG. 5.

For example, it can determined from FIG. 5:

-   -   Probability (0.7<p<0.8) =area between 0.7 and 0.8=0.4    -   Probability (0.8<p<0.9) =area between 0.8 and 0.9=0.1

Thus, although the lengths of the intervals (0.7, 0.8) and (0.8, 0.9)are identical, the probability of the unknown parameter p falling in theinterval (0.7, 0.8) is much higher than the probability of the unknownparameter p falling in the interval (0.8, 0.9), a fact which willinfluence our use of the data to estimate p. More generally, the valuesof p become much more unlikely as p moves away from 0.7 towards either0.5 or 0.9. Clearly, this prior distribution makes the estimated p muchcloser to 0.7 than the estimate that p=0.75 obtained when a Bayesapproach is not taken (and hence p does not have a prior distribution tomodify what the data alone suggests as the estimated value of p). TheBayes approach simply does not allow the data set to speak entirely foritself when it comes to estimating model parameters.

Converting an Ordinary Probability Model into a Bayes Probability ModelIt must be emphasized that converting an ordinary probability model withparameters into a Bayes probability model with prior distributions onthe parameters amounts to developing a new probability model to extendthe old non-Bayes probability model. Indeed, converting a non-Bayesmodel to a Bayes model is not rote or algorithmic but rather is morelike “guild-knowledge” in that it requires knowledge of Bayes modelingand especially of the real world setting being modeled. Choosing aneffective Bayes model can have a large influence on the accuracy of thestatistical inference.

Choosing The Prior Distribution In many Bayes modeling applications, inparticular the Bayes UM approach of the present invention, the choice ofthe prior distributions is carefully done to be informative about theparameters while not being over-informative in the sense of putting moreweight on the prior information than is justified. For example in theBayes example described previously, a somewhat less informative priorthan that of FIG. 5 is given in FIG. 6, often called a vague priorbecause it is rather unobtrusive in its influence over the resultingstatistical inference. In this case of a vague prior the inference thatp=0.75 in the non-Bayes case is moved only slightly towards 0.7.

Finally, the prior of FIG. 7 is totally uninformative about the likelyvalue of p.

As would be suspected, when a Bayesian approach is taken, thenon-Bayesian inference in Example 1 that p=0.75 is in fact unaltered bythe totally uninformative prior plotted above.

Example 3

Continued. Now the Bayes analysis of Example 3 using the triangularprior presented before is continued. Given this Bayes probability modeland data that produced 75% cures, a Bayesian statistical analysis wouldestimate in a formal way as explained below that p=0.72 (instead of thenon-Bayes estimate of 0.75). This is because, through the provided priordistribution, it is included in the inference process the fact thatvalues of p like 0.75 are much less likely than values closer to 0.7.That is, the current Bayes estimate of p=0.72 resulted from combiningthe non-Bayesian analysis of the data from example 1 suggesting p=0.75together with prior knowledge that a p as big as 0.75 is relativelyunlikely compared to p closer to 0.7. The mathematically derived Bayescompromise between these two sources of information (prior anddata-based) produces the compromise Bayes inference that p=0.72.

Basic Bayes Inference Paradigm: Schematic and Formula The flow chart ofFIG. 8 shows the basic Bayes inference paradigm. As with all statisticalprocedures, it starts with observed data (Block 801 of FIG. 8).

Computationally, the Bayes inference paradigm is as follows. Let Xdenote the observed data (Block 801) and ω denote parameters ω of theBayes model. Block 807 indicates the Bayes probability model, which isthe combination of the prior distribution f(ω) on the model parameters(Block 803) and the likelihood probabililty distribution f(X|ω) (Block805). Note that both X and ω are likely to be high dimensional inpractice. Then the posterior distribution of parameters (indicated inBlock 809) given the data is computed as follows${f\left( \omega \middle| X \right)} = \frac{{f\left( X \middle| \omega \right)}{f(\omega)}}{\int{{f\left( X \middle| \omega \right)}{f(\omega)}{\mathbb{d}\omega}}}$

Here, f(ω)≧0 is the prior distribution (f(ω) referred to as a density)on the parameters specially created for the Bayes model. The choice ofthe prior is up to the Bayes practitioner and is indicated in Block 803.Also, in Equation 4, f (X|ω) is the usual likelihood probabilitydistribution (see Block 805; the notion of a likelihood explained below)that is also at the heart of a non Bayesian statistical inference aboutco for observed data X The prior distribution on the parameters and thelikelihood probability distribution together constitute the Bayesprobability model (Block 807). The likelihood probability distributionf(X|ω)≧0 tells that the random mechanism by which each particularparameter value co produces the observed data X, whereas the priordistribution f(ω) tells how likely the practitioner believes each of thevarious parameter values is.

In Equation 4, f(ω|X) denotes the posterior probability distribution ofω when the data X has occurred. It is “posterior” because it is thedistribution of to as modified from the prior by the observed data X(posterior to X). All Bayesian statistical inferences are based onobtaining the posterior distribution f(ω|X ) via Equation 4, asindicated in Block 811. For example, the inference that p=0.72 inExample 3 was the result of finding the value of p that maximizes theposterior f(p|30 cures out of 40 trials).

A key point in actually carrying out a Bayesian analysis is thatcomputing the integral in the denominator of Equation 4 when ω is highdimensional (that is, there are many model parameters) is oftendifficult to impossible, in which case doing a Bayesian inference isalso difficult to impossible. Solving this computational issue will beseen to be important for doing cognitive diagnoses for test data X whenusing a Bayesian version of the UM in the present invention.

Bayesian Statistical Methods Using Markov Chain Monte Carlo The use ofcomplex Bayes models with many parameters has become a reasonablefoundation for practical statistical inference because of the rapidlymaturing MCMC simulation-based computational approach. MCMC is anexcellent computational tool to statistically analyze data sets assumedto have been produced by such Bayes models because it allows bypassingcomputing the complicated posterior distribution of the parameters(Equation 4) required in analytical computational approaches. Inparticular, the specific MCMC algorithm used in the invention (see theDescription of the Preferred Embodiments section), namely theMetropolis-Hastings within Gibbs-sampling algorithm, allows bypassingcomputing the complex integral in the denominator (see Equation 4) oftypical Bayesian approaches (via the Metropolis-Hastings algorithm) andsimplifies computing the numerator (see Equation 4) of typical Bayesianapproaches (via the Gibbs sampling algorithm). Before the advent ofMCMC, complex Bayes models were usually only useful in theory,regardless of whether the practitioner took a non-Bayesian or a Bayesianapproach.

Currently the most viable way to do cognitive diagnoses using examineetest response data and complex Bayes modeling of such data is to analyzethe data using a MCMC (see Chapter 11 of Gelman et al )for a gooddescription of the value of MCMC in Bayesian statistical inference)computational simulation algorithm. Once a Bayesian statistical modelhas been developed for the specific setting being modeled, it is tediousbut relatively routine to develop an effective MCMC computationalprocedure to obtain the posterior distribution of the parameters giventhe data. An attractive aspect of Bayes inference is that the computedposterior distribution provides both model calibration of unknownparameters and the backbone of whatever inference is being carried out,such as cognitive diagnoses concerning attribute mastery and nonmastery.Excellent general references for MCMC computation of Bayes models areGelman et al and Gilks, W.; Richardson, S.; Spiegelhalter, D. (1996)Markov Chain Monte Carlo in Practice. Boca Raton. Chapman & Hall/CRC. Areference for using MCMC computation of Bayes IRT models (the Bayes UMbelonging to the IRT family of models) is Patz et al. Indeed, as thePatz et al title, “A Straightforward Approach to Markov Chain MonteCarlo Methods for Item Response Theory Models”, suggests, thedevelopment and use of MCMC for Bayes IRT models is accessible for IRTand educational measurement, assuming the Bayes IRT model has beenconstructed.

Likelihood-Based Statistical Inference Before understanding thecomputational role of MCMC it is necessary to understand how a Bayesianinference is computationally carried out. This in turn requiresunderstanding how a likelihood-based inference is computationallycarried out, which is explained now. A core concept in statistics isthat, given a specific data set, often a maximum likelihood (ML)approach to parameter estimation is taken. Basically, this means thatthe value of a model parameter is inferred to be in fact the value thatis most probable to have produced the observed data set. In statisticalmodeling, the fundamental assumption is that the given model hasproduced the observed data, for some specific value(s) of itsparameter(s). This idea is simple, as the following illustration shows.If 75% cures is observed in the medical data Example 1 above, then atheoretical cure rate (probability) of p=0.2 is extremely unlikely tohave produced such a high cure rate in the data, and similarly, p=0.97is also extremely unlikely to have produced such a relatively low curerate in the data. By contrast to this informal reasoning, usingelementary calculus, it can be shown that p=0.75 is the value of theunknown parameter most likely to have produced a 75% cure rate in thedata. This statistical estimate that p=0.75 is a simple example ofmaximum likelihood-based inference.

The heart of a likelihood-based inference is a function describing foreach possible value of the parameter being estimated how likely the datawas to have been produced by that value. The value of the parameter thatmaximizes this likelihood function or likelihood probabilitydistribution (which is f(X|ω) in the Bayes Equation 4 above) thenbecomes its maximum likelihood estimate. f(X|ω) is best thought of asthe probability distribution for the given parameter(s)ω. For examplethe likelihood function for 30 cures out of 40 trials is given in FIG.9, showing that p=0.75 indeed maximizes the likelihood function and ishence the maximum likelihood estimate of p.

Bayesian Likelihood Based Statistical Inference This is merelylikelihood-based inference as modified by the prior belief orinformation (as expressed by the prior probability distribution,examples of such a prior shown in FIGS. 5, 6, and 7) of the likelihoodof various parameter values, as illustrated in FIG. 10. “Prior” refersto information available before (and in addition to) information comingfrom the collected data itself. In particular, the posterior probabilitydistribution is the function showing the Bayes likelihood distributionof a parameter resulting from “merging” the likelihood function for theactually observed data and the Bayes prior. For instance, in Example 3with the triangular prior distribution for p of FIG. 5 as before, FIG.10 simultaneously shows the likelihood function for p, the triangularprior for p, and the Bayes posterior distribution (also called the Bayeslikelihood distribution) for p resulting from this prior and havingobserved 30 cures out of 40 trials in the data. Recall that equation 4gives a formula for the needed posterior distribution function for agiven prior and likelihood probability function. Note from the posteriordistribution in FIG. 10 that the estimate of p obtained by maximizingthe posterior distribution is approximately 0.72 as opposed to 0.75 thatresults from using the maximum likelihood estimate that maximizes thelikelihood function.

The Intractability of Computing the Posterior Distribution in ComplexBayesian Statistical Analyses as Solved by MCMC As already stated, thereis often an enormous practical problem in computing the posteriordistribution in complex Bayesian analyses. For most complex Bayesproblems, the computation needed to produce the required posteriordistribution of how likely the various possible values of the unknownparameters are involves an intractable multiple integral that is simplyfar too complex for direct computation, even with the high speedcomputing currently available.

In particular, MCMC is a tool to simulate the posterior distributionneeded to carry out a Bayesian inference in many otherwise intractableBayesian problems. In science and technology, a “simulation” issomething that substitutes for direct observation of the real thing; inour situation the substitution is for the direct computation of theBayes posterior distribution. Then, by observing the results of thesimulation it is possible to approximate the results from directobservation of the real thing.

To illustrate the idea of Monte Carlo simulation in action, let'sconsider a simple simulation approach to evaluating very simpleintegral, which can in fact be easily done directly using elementarycalculus.

Example 4

Evaluate ∫ x e^(−x)dx, the integral over the range 0<x<∞. This integralis solved by simulating a very large number of independent observationsx from the exponential probability density f(x)=e^(−x) (shown in FIG.11). Then the average for this simulated data is computed.

Because of the fundamental statistical law of large numbers (e.g., afair coin comes up heads about ½ of the time if we toss it a largenumber of times), this data produced average will be close to thetheoretical exponential density mean (first moment or center of gravityof f(x) ) given by the integral. For example, if five simulated numbersare 0.5, 1.4, 2.2, 0.9 and 0.6 then we estimate the integral to be theaverage of the simulated numbers, 1.12, whereas the integral's computedvalue is 1. Of course, if high accuracy were required, then it would bedesirable to do 100, 400, or even 1000 simulations, rather than five.Thus this Monte Carlo approach allows accurate evaluation of the unknownintegral without any theoretical computation required.

But for complex, many-parameter, Bayes models, this independentreplications Monte Carlo simulation approach usually fails to bepractical. As a viable alternative, MCMC simulation may be used, therebyavoiding the complex intractable integral needed to solve for theposterior distribution in a Bayes statistical analysis. In particular,MCMC simulation estimates the posterior distribution of severalstatistical cognitive diagnostic models. Each such MCMC uses as inputthe Bayesian structure of the model (UM or other) and the observed data,as the basic Bayes formula of Equation 4. Recall that the Bayesianstructure of the model refers to the prior distribution and thelikelihood probability distribution together

Non-UM Prior Art Examples Now that the necessary conceptual backgroundof statistical concepts and data computational techniques (especiallyBayes probability modeling and MCMC) have been explained andillustrated, the relevant prior art is described (in addition to theUM), consisting of certain other proposed or implemented cognitivediagnostic procedures.

Four non-UM model based statistical cognitive approaches are described(that is, the methods are based on a probability model for examineeresponding to test items) that can do cognitive diagnosis using simplyscored test data. These seem to be the main statistical approaches thathave been developed to the point of actually being applied. It issignificant to note that only Robert Mislevy's approach seems to havebeen placed in the commercial arena, and then only for complex and veryspecialized applications (such as dental hygienist training) based oncomplex item types rather distinct from the cognitive diagnosis ofsimple right/wrong scored test items. The four approaches are:

-   -   1. Robert Mislevy's Bayes net evidence-centered approach    -   2. Kikumi Tatsuoka's Rule-space approach    -   3. Susan Embretson's Generalized Latent Trait Model (GLTM)    -   4.Brian Junker's Discretised GLTM

Robert Mislevy's Bayes Net Approach The Bayes net approach is consideredfirst. Two excellent references are Mislevy, R, 1995, Probability basedinference in cognitive diagnosis. In Nichols, et at. CognitivelyDiagnostic Assessment. Mahway, N.J., Lawrence Erlbaum and Mislevy,Robert and Patz, Richard, 1998, Bayes nets in educational assessment:where the numbers come from. Educational Testing Company technicalreport; Princeton N.J. Like the Bayes UM approach of the invention (seethe Description of the Preferred Embodiments section), this is aBayesian model based statistical method. Although usually applied insettings other than those where the primary data is simply scored (suchas items scored right or wrong) examinee responses to ordinary testquestions, it can be applied in such settings, as shown in the researchreported in Section 5 of Mislevy et al. Crucially, although it doesassume latent attributes, as does the UM, it does not use the conceptsof item/attribute positivity or incompleteness (and hence the Bayes netapproach does not introduce θ to deal with incompleteness) that theBayes UM of the invention uses. The model simplifying role played by θand the positivity parameters π's and r's in UM methodology, thus makingthe UM model used in the invention tractable, is instead replaced in theBayes net approach by graph-theoretic techniques to reduce theparametric complexity of the Bayes net's probability tree of conditionalprobabilities linking latent attribute mastery states with examineeresponses to items. These techniques are in fact difficult for a nongraph-theoretic expert (as is true of most cognitive diagnostic users)to use effectively.

The Educational Testing Service (ETS) is commercially marketing theBayes net technology under the name Portal, and indeed have used Portalin the training of dental hygienists. But this approach is not easy forpractitioners to be able to use on their own, for reasons alreadystated. In particular, exporting the approach for reliably independentuse outside of ETS has been difficult and requires serious training ofthe user, unlike the Bayes UM methodology of the present invention.Further, it may not have the statistical inference power that thepresent UM invention possesses, especially because of the important roleplayed by each of positivity, incompleteness with the introduction of θ,and the positive correlational structure that the Bayes UM of thepresent invention places on the attributes (the importance of which isexplained below in the Description of the Preferred Embodimentssection). A schematic of the Bayes net approach is shown in FIG. 12. Itshould be noted that Blocks 201, 203, and 207 of the Bayes net of FIG.12 approach are in common with the DiBello et al 1995 approach (recallFIG. 2). Block 1201 is just Block 807 of FIG. 8 of the genereal Bayesinference approach specialized to the Bayes net model. Similarly Block1203 is a special case of computing the Bayes posterior (Block 809 ofFIG. 8), in fact using MCMC. Finally the cognitive diagnostic step(Block 1205) is just a special case of the Bayes inference step (Block811). Kikumi Tatsuoka's Rule Space Approach Two good references areTatsuoka, K., 1983, Rule space; an approach for dealing withmisconceptions based upon item response theory. Psychometrika 20, 34-38,and Tatsuoka, Kikumi, 1990, Toward an integration of item responsetheory and cognitive error diagnosis. Chapter 18 in DiagnosticMonitoring of Skill and Knowledge Acquisition. Mahwah, N.J., LawrenceErlbaum. A schematic of the Rule Space approach is shown in FIG. 13. Therule space model for the randomness of examinee responding for eachpossible attribute vector structure is in some ways more primitive andis much different than the Bayes UM of the present invention. It isbased entirely on a probability model of random examinee errors, called“slips” by Tatusoka. Thus the concept of completeness is absent and theconcept of positivity is expressed entirely as the possibility of slips(mental glitches). The computational approach taken is typicallyBayesian. Its fundamental idea is that an actual response to the testitems should be like the “ideal” production rule based deterministicresponse (called the ideal response pattern) dictated by theitem/attribute incidence matrix and the examinee's true cognitive stateas characterized by his/her attribute vector, except for random slips.Cognitive diagnosis is accomplished by an actual examinee responsepattern being assigned to the “closest” ideal response pattern via asimple Bayesian approach. Thus the rule space approach is basically apattern recognition approach. A rule space cognitive diagnosis iscomputationally accomplished by a complex dimensionality reduction ofthe n dimensional response space (because there are n items) to the twodimensional “rule space” (see Block 1303 and the two Tatsuoka referencesfor details). This produces a two dimensional Bayesian model (Block1301, which is analogous to the general Bayes model building Block 807of FIG. 8) This reduction to the low dimensional “two space” allows oneto directly carry out the needed Bayes computation (see Block 1305)without having to resort to MCMC. Then the attribute state that a thatbest predicts the assigned ideal response pattern is inferred to be theexaminee's cognitive state, thus providing a cognitive diagnosis. Thisapproach has no completeness, no positivity, no positive correlationalstructure imposed on the attributes, and its probability of slipsdistribution is based on some assumptions that seem somewhatunrealistic. In particular, the Bayes UM approach of the presentinvention should outperform the Rule-space approach for the abovereasons. The two approaches are very distinct both in their probabilitymodels for examinee response behavior and in the Bayes calibration anddiagnostic algorithm used. It should be noted that Blocks 201, 203, 205,and 207 are in common between the DiBello et al 1995 UM approach and theRule-space approach. As with all cognitive diagnostic approaches, thelast block, here Block 1307, is to carry out the actual cognitivediagnosis.

Susan Embretson's Generalized Latent Trait Model (GLTM) Two goodreferences are Chapter 11 of Susan Embretson's, 2000, book Item ResponseTheory for Psychologists, Erlbaum, N.J., and Embretson, Susan, 1997,Multicomponential response models Chapter 18 in Handbook of Modern ItemResponse Theory, Edited by van der Linden and Hambleton, N.Y., Springer.This approach is distinct from the Bayes UM of the present invention. Itassumes that the attributes to be inferred are continuous rather thanbinary (0/1) as is assumed in the Bayes UM, and it has no incompletenesscomponent and no positive correlational attribute structure. Because ittreats attributes as continuous, it tends to be applied to continuouslatent abilities like “working memory” capacity and time until taskcompletion. It uses, at least in its published descriptions, acomputational approach called the EM algorithm, and thus the GLTM modelis not recast in a Bayesian framework. Although in principle applicableto ordinary simply scored test data, that does not seem to be itsprimary focus of application. A schematic of the GLTM is shown in FIG.14. Block 1401 is similar to Block 201 of FIG. 2, except here theattributes are continuous. Blocks 203 and 207 are in common with theother prior art procedures. Block 1405 is analogous to the FIG. 2 UMBlock 209, Block 1405 is analogous to the FIG. 2 UM Block 213, andfinally in common with all procedures, the last Block 1407 is thecarrying out of a cognitive diagnosis.

Brian Junker's Discrete (0/1) Version of GTLM The idea is to replaceEmbretson's continuous latent attributes in her GTLM model by binaryones and keep the general structure of the model the same. A goodreference is Junker, Brian, 2001, On the interplay between nonparametricand parametric IRT, with some thoughts about the future, Chapter 14 inEssays on Item Response Theory, Edited by A. Boomsma et al., New York,Springer. Perhaps a primary distinction between this new approach andthe Bayes UM approach of the present invention is that Discrete GTLMdoes not have an incompleteness component. Further, it has no positivecorrelational attribute structure. Finally its positivity structure ismuch simpler than the Bayes UM of the present invention in that forDiscrete GTLM the degree of positivity of an attribute is not allowed todepend on which test item is being solved. The computational approachfor Discrete GTLM is MCMC.

Only contrasting flow diagrams have been provided for the first threestatistical procedures just described (the Junker Discrete GTLM beingalmost identical to the Embretson GTLM schematic).

The most fundamental difference between various prior art approaches andthe present invention is always that the model is different, althoughthere are other distinguishing characteristics too.

2. Deterministic Cognitive Model Based Procedures There are numerousapproaches that use a deterministic cognitive diagnosis approach. Thestatistical approaches are by their statistical nature superior to anydeterministic approaches (that is, rule-based, data mining, artificialintelligence, expert systems, AI, neural-net based, etc.). Alldeterministic approaches have no deep and valid method for avoidingover-fitting the data and thus erroneously conclude attribute masteriesand non-masteries where in fact the supporting evidence for suchconclusions is weak

Further, these deterministic approaches all have models that areparametrically far too complex to support model calibration usingordinary simply scored test-data. These models are numerous in numberand are simply too far afield to be useful for cognitive diagnosis inthe simple test data environment.

Part 3. Prior Art in the Medical and Psychiatric Area Above, only theeducationally oriented cognitive diagnostic setting has been considered.But, cognitively diagnosing an examinee based on the performance onobserved items and medically diagnosing a patient have a similarstructure. In both the attempt is to measure a latent state (attributeor medical/psychiatric disorder, simply referred to as a “disorder”below) based on observed information that is related to the latentstate. In order to make inferences about a particular attribute ordisorder, it is also important to understand the state of the person interms of other attributes or disorders. In particular, in medicine andpsychiatry, the goal of diagnostic tools is to provide the practitionerwith a short list of disorders that seem plausible as a result of theobserved symptoms and personal characteristics (such as gender,ethnicity, age, etc.) of the patient. Specifically, Bayesian posteriorprobabilities assigned to the set of disorders is analogous to assigninga set of posterior probabilities to a set of cognitive attributes.Although probability modeling approaches have been attempted in medicineand psychiatry, probability-based IRT models have not been attempted.

Next we list medical and psychiatric diagnostic prior art instances thathave a probabilistic flavor.

Bayesian Network Based Systems A Bayesian Network for medicaldiagnostics represents the probabilistic relationship between disordersand symptoms/characteristics in a graph that joins nodes that areprobabilistically dependent on one another with connecting lines. A goodgeneral reference is Herskovits, E. and Cooper, G., 1991, Algorithms forBayesian belief-network precomputation, Meth. Inf. Med, 30, 81-89. Adirected graph is created by the Bayes net modeling specialist and leadsfrom the initial set of nodes that represent the set of disordersthrough an optional set of intermediate nodes to the resulting observedset of symptoms/characteristics. Given a patient's particular set ofobserved symptoms/characteristics, the posterior probability of having acertain disorder is calculated using the Bayes approach of Equation 4and possibly MCMC. Here a prior distribution has been assigned to theproposed set of possible disorders, and specifying the conditionalprobabilities for each node given a predecessor node in the graphspecifies the needed likelihood function of Equation 4. In this mannereach line of the graph has a conditional probability associated with it.Medical applications of Bayesian Networks originally obtained therequired numerical values for the conditional probabilities byconsulting the appropriate medical literature, consulting availablelarge data sets, or using expert opinion. Now, estimation techniques forobtaining these conditional probabilities have recently been developed.Even though the ability to estimate the conditional probabilities isimportant for the Bayesian Networks to work, the major impedimentremains that many model-simplifying assumptions need to be made in orderto make the network statistically tractable, as explained above in thediscussion of the Bayes net prior art approach to cognitive diagnosis.

Neural Network and Fuzzy Set Theory Based Systems Both Neural Networksand Fuzzy Set Theory based approaches are graphical networks that designthe probability relationships between the symptoms/characteristics anddisorders via using networks and then do extensive training using largedata sets. The networks are less rigidly specified in Neural Networksand in Fuzzy Set Theory based networks than in Bayesian Networks. Thetraining of the networks essentially compares many models that arecalibrated by the training process to find one that fits reasonablywell. Fuzzy Set Theory techniques allow for random error to be builtinto the system. Neural Networks may also build in random error as well,just not in the formal way Fuzzy Set Theory does. Both systems havecertain problems that result from the great freedom in the trainingphase: over/undertraining, determining the cases (data) to use fortraining because the more complex the model the more cases needed,determining the number of nodes, and the accessibility of appropriatedata sets that will generalize well. This approach is very distinct fromthe UM specified model parametric approach. Good references are Berman,I. and Miller, R., 1991, Problem Area Formation as an Element ofComputer Aided Diagnosis: A Comparison of Two Strategies within QuickMedical Reference Meth. Inf. Med., 30, 90-95 for neural nets andAdlassnig, K., 1986, Fuzzy Set Theory in Medical Diagnosis, IEEE TransSyst Man Cybernet, SMC-16:260-265.

Deterministic Systems Two deterministic approaches used are BranchingLogic Systems and Heuristic Reasoning Systems. As discussed above in thecognitive diagnostic prior art portion, all deterministic systems havedrawbacks in comparison with probability model based approaches like theUM.

SUMMARY OF THE INVENTION

The present invention does diagnosis of unknown states of objects(usually people) based on dichotomizable data generated by the objects.Applications of the present invention include, but is not limited to,(1) cognitive diagnosis of student test data in classroom instructionalsettings, for purposes such as assessing individual and course-widestudent cognitive progress to be used such as in guidinginstruction-based remediation/intervention targeted to address detectedcognitive deficits, (2) cognitive diagnosis of student test data incomputerized instructional settings such as web-based course deliverysystems, for purposes such as assessing individual and course-widecognitive progress to be used such as to guide computer interactiveremediation/intervention that addresses detected cognitive deficits, (3)cognitive diagnosis of large-scale standardized tests, thus assessingcognitively defined group-based cognitive profiles for purposes such asevaluating a school district's instructional effectiveness, andproviding cognitive profiles as feedback to individual examinees, and(4) medical and psychiatric diagnosis of medical and mental disordersfor purposes such as individual patient/client diagnosis, treatmentintervention, and research.

In addition to doing cognitive or other diagnosis in the settings listedabove, the scope of application of the present invention includes thediagnosis of any latent (not directly observable) structure (possessedby a population of individual objects, usually humans) using anytest-like observed data (that is, multiple dichotomizably scored piecesof data from each object such as the recording of multiple questionsscored right/wrong observed for each test taker) that isprobabilistically controlled by the latent structure as modeled by theUM. To illustrate, attitude questionnaire data might be diagnosed usingthe present invention to infer for each of the respondees certainattributes such as social liberal vs. conservative, fiscal liberal vs.conservative, etc.

Terminology Defined

Attribute. Any latent mental capacity that influences observable mentalfunctioning

Items Questions on a test whose examinee responses can be encoded ascorrect or incorrect

Residual Ability Parameter. A low dimensional ( certainly not greaterthan 6, often unidimensional) set of quantities that together summarizeexaminee proficiency on the remainder of the larger group of attributesinfluencing examinee performance on items

Dichotomously scored probe. Analogous to an item in the cognitivediagnosis setting. Anything that produces a two valued response from theobject being evaluated

Objects. Analogous to examinees in the cognitive diagnostic setting. Anyset of entities being diagnosed

Association. Any relationship between two variables such as attributeswhere the value of one variable being larger makes the other variableprobabilistically tend to be larger (positive association) or smaller(negative association). Correlation is a common way of quantifyingassociation.

Unobservable dichotomized properties. Analogous to attributes incognitive diagnostic setting. Any property of objects that is notobservable but either has two states or can be encoded as having twostates, one referred to as possessing the property and the other as notpossessing the property. Applying property appropriately means enhancingthe chance of a positive response to the probes dependent on theproperty.

Symptoms/characteristics. Analogous to items in the cognitive diagnosticsetting. Observable aspects of a patient in a medical or psychiatricsetting. Can be evident like gender or the symptom of a sore throat, orcan be the result of a medical test or question put to the patient. Incurrent UM applications needs to be dichotomizable

Health or Quality of Life parameter. Analogous to the summary of theremaining attributes given by θ in the cognitive diagnostic setting. Ageneral and broad indicator of a patient's state of medical well beingseparate from the specified disorders listed in the UM medicaldiagnostic application.

Disorder. Any medical or psychiatric condition that is latent, and henceneeds to be diagnosed, and constitutes the patient being unwell in someregard.

Probe. Analogous to an item in the cognitive diagnostic setting.Something that brings about a two-valued response from an object beingdiagnosed.

Positive or negative response to a probe. Analogous to getting an itemcorrect or incorrect in the cognitive diagnostic setting. Positive andnegative are merely labels given to the two possible responses to aprobe, noting that sometimes a “positive” response is contextuallymeaningful and sometimes it isn't.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows the standard logistic item response function P(θ) used asthe basic building block of IRT models in general and in the UM inparticular.

FIG. 2 displays the flow chart for the 1995 prior art proposed UMcognitive diagnostic procedure.

FIG. 3 displays a schematic of the 1995 UM probability model for therandom response X_(ij) of one examinee to one item, indicating theexaminee parameters and item parameters influencing the examineeresponse X_(ij).

FIG. 4 displays the standard normal probability density function assumedfor the distribution of examinee residual ability θ in the UM.

FIG. 5 displays an informative triangular prior density f(p) for theparameter p=Prob(cure) in a statistical drug trial study. FIG. 6displays a vague (relatively uninformative) Bayes prior density f(p) forthe parameter p=Prob(cure) in a statistical drug trial study.

FIG. 7 displays a totally uninformative Bayes prior density f(p) in astatistical drug trial study.

FIG. 8 displays the components of the basic Bayes probability modelstatistical inference paradigm.

FIG. 9 displays the likelihood function f(X|p) for p=Prob(cure) in astatistical drug trial study where the data was 30 cures out of 40trials, indicating that p=0.75 maximizes the likelihood function.

FIG. 10 displays simultaneously the the prior density, the likelihoodfunction, and the posterior distribution for p: f(X|p)=f(30 cures out of40|p) where p=Prob(cure) in a statistical drug trial study producing 30cures out of 40 trials. This illustrates the effect of a Bayesian priordistribution on the standard statistical maximum likelihood estimate ofp=0.75, producing the Bayesian posterior estimate of p=0.72.

FIG. 11 displays the function e^(−x), which is to be integrated viasimulation.

FIG. 12 displays a flow chart of Robert Mislevy's Bayes probabilityinference network approach to cognitive diagnosis.

FIG. 13 displays a flow chart of Kikumi Tatsuoka's Bayesian Rule Spaceapproach to cognitive diagnosis.

FIG. 14 displays a flow chart of Susan Embretson's GLTM approach tocognitive diagnosis.

FIG. 15 displays a schematic of the UM likelihood for the randomresponse of one examinee to one item, indicating the examinee parametersand item parameters influencing the examinee response X_(ij) for thereparameterized Unified Model used in the present invention.

FIG. 16 displays the dependence representation of the identifiableBayesian version of the reparameterized UM used in the inventionincluding prior distributions and hyperparameters.

FIG. 17 a displays the flow chart of the UM cognitive diagnosisprocedure used in the present invention.

FIG. 17 b displays the flow chart of the UM medical/psychiatricdiagnosis procedure used in the present invention.

FIG. 17 c displays the flow chart of the general UM procedure used inthe present invention.

FIG. 18 displays a page of the introductory statistics exam toillustrate items simulated in the UMCD demonstration example.

FIG. 19 displays an item/attribute incidence matrix for the introductorystatistics exam simulated in the UMCD demonstration example.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention is based in part on discoveries of failings of the1995 DiBello et al UM proposed approach. These were overparameterizationthat caused parameter nonidentifiability, the failure to set masterylevels that also was a further cause of nonidentifiability and raisedsubstantive issues of interpretation for the user, the lack of apractical and effective calibration procedure, and a failure to modelthe natural positive correlational structure existing between attributesto thereby improve cognitive diagnostic accuracy. These failings arediscussed first. To do so, more must be understood aboutparameterization and identifiability.

Nonidentifiability and Model Reparameterization in Statistical ModelingIn statistical modeling, a model with fewer parameters that describesreality reasonably well is much preferred to a model with moreparameters that describes reality at best a bit better. This isespecially important if the model with more parameters hasnonidentifiable parameters, namely parameters that statistically cannotbe separated from one another, that is parameters that cannot beestimated at all from the data. A trivial example illustrates theimportant ideas of nonidentifiability and the need forreparameterization. Consider the model y=a+bx+cx. This model has threeparameters a, b, c. But the model is over-parameterized in that b and cplay exactly the same role (a parameter multiplying the variable x) andhence cannot be statistically distinguished from each other. Thus themodel parameters b and c are nonidentifiable and cannot be estimatedfrom available data. The two parameter model y=a+bx is superior becauseit has one less parameter, all its parameters are identifiable, and itdescribes reality just as well. With the present invention thenot-useful and non-identifiable 1995 UM was reparameterized by reducingthe number of parameters through the introduction of a smaller yetsubstantively meaningful set of parameters and through specifyingattribute mastery levels, thereby producing all identifiable, and henceestimable, parameters.

The General Approach to Reparameterization Assume a model with ameaningful set of K parameters; i.e., the parameters have usefulreal-world substantive interpretations (like velocity, mass,acceleration, etc., do in physics models). The general method is for k<Kto define new and meaningful parameters a₁, a₂, . . . , a_(k), each abeing a different function of the original set of K parameters. It isdesirable to choose the functions so that the new set of parameters areboth identifiable and substantively meaningful. A validreparameterization is not unique and there thus exist many useful andvalid reparameterizations.

Now consider the nonidentifiability in the 1995 UM.

Sources of Nonidentifiability in the Prior Art 1995 UM of DiBello etal.: Failure to Paramerization Parsimoniously and Failure to SpecifyMastery Levels It has been discovered that the source of thenonidentifiabililty was twofold. First, the number of parameters had tobe reduced by a substantively meaningful reparameterization using thegeneral approach explained above.

Second, it was discovered that it is necessary as part of the model tospecify the mastery level for each attribute in the model. Essentially,specifying the mastery level defines how proficient an examinee must bein applying an attribute to items in order to be classified as havingmastered the attribute. This mastery specification is needed not only toachieve identifiability but also is required so that users are empoweredto draw substantively meaningful conclusions from the UM cognitivediagnoses. Indeed, it is a meaningless claim to declare an examinee amaster of an attribute unless the user knows what attribute masteryactually means in the context of the test items that make up the test.Thus, any cognitive diagnostic model that fails to somehow set masterylevels has a fundamental flaw that will cause serious malfunctioning.

Failure to Use the Positive Correlational Structure of Attributes in1995 UM Another problem discovered with the 1995 UM was that much of theinformation about the association between attributes available inexaminee data was not being taken advantage of, a flaw correctable bycarefully recasting the model as a Bayesian model. Of course, other waysto also capture much of the available information may be found in thefuture, rendering Bayes modeling not the only choice.

The result of dealing effectively with these discoveries(overparameterization, lack of mastery specification, failure to useattribute positive associational structure) is a practical and powerfulcognitive diagnostic procedure that can be applied to actual test datato produce actual cognitive diagnoses for examinees taking the test,namely the UMCD of the present invention.

Failure to Achieve Calibration of the 1995 UM Just as fundamental to thedevelopment of a useful UM-based cognitive diagnostic procedure, wasfinding a useful calibration procedure. In fact, calibration of themodel had not been accomplished in DiBello et al. Both thenonidentifiabililty and the non-Bayesian character of the model werebarriers to calibration. Not achieving such calibration had precludeddoing effective cognitive diagnosis. The recent popularization of thenew data computational MCMC approach allows the calibration of Bayesmodels, even when the models are parametrically very complex. Thissuggested that recasting the 1995 UM as a Bayes Model was a viablestrategy for achieving effective calibration of the model. Again, itmust be made clear that without calibration, cognitive diagnosis isimpossible no matter how realistic the model is. For example, theillustration of a simulated UM-based cognitive diagnosis presented inDiBello et al was achieved only by pretending that the UM had beencalibrated, contrary to what was statistically possible at the time ofthe publication of the paper. Thus cognitive diagnosis using the 1995 UMwas not possible at the time of its publication and indeed was notpossible until the Bayes UM of the present invention with identifiedparameters and mastery specified was developed and its computationalMCMC based model calibration.

Now the developed reparameterization that is used in the UMCD of thepresent invention is discussed.

The Reparameterization Used to Replace the Overparameterization of the1995 UM In particular, a reparameterization of the non-Bayesian UM as itwas published in DiBello et al to make the parameters “identifiable” wasnecessary (Equation 5 below). In particular, it was realized thatreparameterization of the 1995 UM was required for adequate cognitivediagnosis. That is, the original parameters that were redundant in theUM had to be replaced, even though substantively they had meaningfulinterpretations. (A non-Bayes UM reparameterization is conceptuallyanalogous to replacing the nonidentifiable overparameterized modely=a+bx+cx by the simpler and not over parameterized identifiable modely=a+bx, as presented above.)

Moreover, the reparameterization had to result in identifiableparameters that “made sense” by being easily understood by actualpractitioners. The particular choice of reparameterization, as explainedbelow, seems to be an essential reason why the UM procedure works wellin applications and is easy for users to understand and interpret.

Basic concepts of the recast UM used in the invention are explainednext. Frequent referral to FIG. 15, comparing FIG. 15 with FIG. 3, andexamining Equations 5 and 6 is essential. Understanding what is uniqueabout the UM as modeled by the present invention is key to understandingwhat is unique and effective about the cognitive diagnostic algorithm ofthe present invention. Some of this has already been explained in thedescription of the prior art 1995 version of the UM. What makes the UMCDwork effectively to do cognitive diagnoses is unique to FIG. 15 andEquations 5 and 6 described below.

As already stated, one cognitive construct of fundamental importance inthe UM is positivity, which is made explicit in Equation 5 for S_(ij)using the reparameterized π* and r* of Equation 6 as explained below.Equation 5 is analogous to Equation 3 for S_(ij), which used theoriginal parameterization in terms of r and π. Both equations for S_(ij)give the probability that the included attributes are applied correctlyto the solution of Item i by Examinee j. Equation 5 provides areparametization of the π's and r's in order to achieve substantivelymeaningful parameters that are identifiable. The Equation 3 version ofS_(ij) is replaced with the Equation 5 version below, noting that bothformulas produce the same value for S_(ij).S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm)   (5)

As stated above, the general approach to reparameterization requiresdefining the new identifiable parameters (π*'s, r*'s) as functions ofthe old, non-identifiable parameters (π's, r's). This is simply done asfollows. Consider on item i requiring k=1, . . . , m attributes. Thendefiningπ*_(i)=Ππ_(ik) (product is over k) andr* _(ik) =r _(ik)/π_(ik)   (6)produces the reparameterization. Note that there are 2 m π_(k) andr_(ik) and only m+1 π*_(i) and r*_(ik).

As stated, the i^(th) item requires m attributes labeled 1,2, . . . mand a_(jk)=1 or 0 denotes whether examinee j has mastered attribute k ornot. Then π_(i)* is interpreted as the probability that an examinee whohas mastered all of the required attributes for item i indeed appliesthem correctly. That is, π_(i)* is a measure of how difficult the itemis for an examinee who has mastered all the required attributes.

Next, r_(i1)* for Attribute 1 is by its definition above the probabilityof applying the attribute correctly to Item i if not mastered divided bythe probability of applying the attribute correctly if mastered. Ther*'s for the other attributes are defined similarly. A value of r_(ik)*≈0 for an Attribute k simply means that there is a big advantage tohaving mastered the attribute when trying to answer Item i correctly. Anr*_(ik) relatively close to 1 simply means there is little advantage tohaving mastered the Attribute k over not having mastered Attribute kwhen trying to solve item i.

If the π_(i)* is close to I and all the r_(ik)*'s are close to 0 forItem i, then the required attributes are referred to as highly positivefor Item i. “Highly positive” as before simply means that with highprobability an examinee uses the attributes required for the itemcorrectly if and only if the examinee possesses all of the attributesthat the model says are needed for the item.

It should be noted that the r*'s and the π*'s together with themastery-setting ρ_(k)'s of FIG. 16 (with mastery setting explained belowas well) is sufficient to produce the needed identifiability that wasmissing in DiBello et al. This number of parameters is sufficient toachieve identifiability once attribute mastery levels are specified.

The Hierarchical Bayes UM, Including the Setting of Mastery Levels andthe Introduction of an Attribute Positive Correlational Structure Therole of the Bayesian portion of the Bayes UM is important as thereparameterized UM formula for achieving effective and powerfulcognitive diagnoses is. This is done by introducing a Bayes model withhyperparameters, a hierarchical Bayes model. As stated in theDescription of the Prior Art section, a Bayesian model is a probabilitymodel for which the model parameters are also assigned a probabilitydistribution. A Bayesian model with hyperparameters is a Bayesian modelin which the prior distributions of the basic parameters of the modelare in turn also given parameters each having a prior distribution.These additional parameters that control the prior distribution of theusual model parameters are referred to as hyperparameters. A goodreference for Bayes modeling in general and hierarchical Bayes modelingin particular is Gelman et al.

FIG. 16 schematically displays the hierarchical Bayes model for anexaminee responding to an item as modeled by our hierarchical Bayes UM.As such it is an augmentation of the reparameterized likelihoodschematic of FIG. 15.

In the FIG. 16 diagram, the model parameters π*, r*, and c/3 have aprior beta distribution, denoted β(a,b) for each item i, each suchdistribution determined by two parameters (a,b). Beta distributions tendto work well as prior distributions for parameters that are constrainedto lie in the interval (0,1), as indicated and explained in Chapter 2 ofthe Gelman et al book, and which is true of π*, r*, and c/3. Inparticular the beta distribution parameters (a,b) provide a rich familyof densities from which just about any choice of shape for the prior maybe selected, an attractive property from the modeling perspective. Each(a,b) hyperparameter has been given a uniform distribution on theinterval (0.5,2). This means that each value of the parameter, a_(r)say, within the interval (0.5,2) is equally likely. This uniform priorover a wide interval is the kind of suitable relatively non-informative(vague) prior that is effective in hierarchical Bayes models in that itallows the model to fit the data well without the prior having aninappropriately strong influence on the statistical inference. It isnoted that these distributional choices (beta, uniform) are fairlystandard choices, although a certain amount of judgement is required toconstruct prior distributions for the relevant variables.

The Bayesian structure associated with the examinee latent abilityparameters (that is, the incompleteness residual ability θ and theattribute mastery/nonmastery components of α) is now explained. Thisexplanation serves to highlight two important components of the currentUM procedure, namely specifying attribute mastery levels and assuming apositive correlational attribute structure as part of the Bayes model.It is assumed the examinee attributes and θ are derived from amultivariate normal distribution with positive correlations. Amultivariate normal distribution is a standard and well-understooddistribution for statisticians. For example if a person's weight andheight is measured, then the standard model is a bivariate normaldistribution with weight and height positively correlated. For moreinformation, consult any standard statistics textbook.

Specifying the prior distribution of attributes α and θ is done in twostages. At stage one, (θ,α′) is given a multivariate normal prior, whereα′ is the continuous precursor of the dichotomous valued (0/1 valued)components of α that specify mastery or nonmastery for each attributefor each examinee. The attribute pair correlations σ_(kk).(hyperparameters) for α′ are assigned a uniform prior distribution onthe interval (0,1) because all that is known about them is that they arepositive. Then the attribute mastery/nonmastery vector α comes fromdichotomizing each component of α′ into a 0 or 1 according as its valueis larger than or smaller than the user specified mastery level, whichis determined most simply by the user-specified examinee masteryproportions (probabilities) p_(k) for each attribute. That is, the userspecifies what it means to be a master of an attribute by specifying theproportion of masters of each attribute (other methods of specifyingattribute mastery can be found and in fact may be preferable but this isthe most straightforward). For example if the user specifies p_(k)=0.7then the attribute k is said to be mastered by 70% of the examinees.Then α_(k)=1 70% of the time, in fact when its corresponding α′_(k) issufficiently large. Then α_(k)=0 the other 30% of the time.

To help explain the need to specify mastery levels, consider thefollowing thought experiment. What does it mean to say that somebodydisplays mastery for the factorization of polynomials (Attribute 1)?Clearly a disagreement on the appropriate level of competency requiredcould occur. So, specifying that 60% (p₁=0.6) of the population ofexaminees are masters has the effect of defining precisely the masterylevel. Choosing 80% instead has the effect of demanding a higher levelof cognitive functioning before labeling a person as having mastered theattribute.

In addition to the importance of specifying mastery levels, it must bereemphasized that the positive correlational structure for the componentattribute pairs of α assumed in the Bayes portion of the UM improvescognitive diagnostic accuracy. For, this positive correlationalstructure allows the model to capture the all-important fact thatexaminees that have mastered one attribute are more likely to havemastered another attribute; that is, attributes are positivelycorrelated or more simply, positively associated. Moreover, this veryimportant building-in of a positive correlational structure for theattributes was done by casting the UM in a Bayes framework. However, thepresent invention is not limited to the Bayesian framework. Thuscombining an effective positive correlational attribute structure(currently done using a Bayes approach) with the reparameterized andhence identifiable and level-of-mastery-specified UM are all componentsuseful for producing an effective UMCD. That is, each of these, incombination with others, and in combination with the UM, which isdefined as any attribute based diagnostic model using positivity andcompleteness to develop its equations, contribute to present inventionperformance.

FIG. 16 schematically shows an embodiment of the hierarchical Bayes UMin the UMCD Thus, the present invention is not limited to the embodimentof the UMCD with its Bayes model and cognitive diagnostic MCMCcomputational algorithm.

It is important to realize that the conversion of a non-Bayesianprobability model to a Bayes probability model is an activity that isentirely distinct in its details from application to application. Suchactivities are seldom the same. Thus, the effort begins afresh for eachdistinct, new setting where Bayes modeling of the data is required. Inparticular, there is not one right way to develop an appropriate Bayesmodel. Moreover, an appropriately chosen Bayes model, as done so for theUM, can make effective use of all the information in the data and henceachieve much more accurate inferences (in this case, much more accuratecognitive diagnoses).

FIG. 17 a provides a flow chart of the method of the present invention.First note that the Blocks 201, 203, 205, and 207 are identical to theUM based blocks of FIG. 2. This reflects that both take the sameapproach except for the details of the UM model used. Thus thenon-Bayesian approach of FIG. 2 and the Bayes approach of FIG. 17 adiverge from Block 205 down. First, although both require a likelihoodmodel, as already discussed, reparameterization issues related to thenonidentifiability of the 1995 UM led to the discovery of thereparameterization given in Equation 5 to replace the oldparameterization of Equation 3. Further, building the likelihood model(Blocks 209 and 1701 respectively) now also requires a “Build UM Bayesprior f(ω)” block (Block 1703), thus producing the Bayes model Block1705. Blocks 1701, 1703 and 1705 of FIG. 17 reflect Equations 5 and 6 aswell as the FIG. 16 schematic. Blocks 1707, 1709, and 1711 areunderstood as follows. The needed posterior distributions f(ω|X ) isobtained as explained above via MCMC (Block 1707). Then the posteriorprobabilities of unidimensional α_(ik)'s (to make individualattribute/examinee cognitive diagnoses) are extracted from the posteriorprobability f(ω|x) by standard techniques, yielding Block 1709, whichyields Prob (α=1|X) for each examinee/attribute combination. Then usinga strength of evidence rule such as illustrated in the example below,cognitive diagnoses for every examinee/attribute combination (Block1711) is obtained.

A Brief Description of the MCMC Algorithm Used in the Bayes UM of theInvention. The general description of the MCMC algorithmic approach usedfor the Bayesian UM can be read about in Patz et al in sufficient detailfor people with ordinary skill in the art to create and use it. Asalready stated, the approach is referred to as the Metropolis-Hastingsalgorithm embedded within a Gibbs sampler, or M-H within Gibbs forshort. The Metropolis Hastings algorithm allows for simplification ofthe calculation of the posterior distribution by eliminating thecalculation of the denominator (see Equation 4) usually present inposterior distribution calculations. The Gibbs sampler allows theremainder of the calculation (the numerator of Equation 4) to bepartitioned into bundles that are individually easier to calculate thanthey are jointly (because jointly the calculations interactively dependon one another). M-H within Gibbs is one of numerous variations of thebasic MCMC approach.

In the case of MCMC, the simulated random numbers of the Markov Chainare probabilistically dependent (like the daily high temperatures on twoconsecutive days). And, as is carefully explained in Patz et al (and inany other good general reference on doing Bayesian analysis using MCMC,such as in Gelman et al or in Gilks et al), the MCMC simulation avoidsentirely the computing (or even simulating of it) of the integral in thedenominator and instead produces a “chain” of random numbers whosesteady state probability distribution is the desired posteriordistribution. In simple and practical terms, this means that if thechain for can be run a long time, then the observed distribution of itssimulated random numbers tells approximately what the required posteriordistribution is, thus bypassing the direct or simulated computation ofit.

As a practical matter, in the Bayes UM setting, MCMC estimates therequired posterior distribution with surprising accuracy because we alarge number of random numbers of the chain are generated. In particularthe procedure of the present invention typically runs a chain of length15000 with the first 5000 generated simulations of the chain thrown outbecause they are not yet in the required steady state. The MCMCsimulation approach is at present the only viable approach forstatistically analyzing parametrically complex Bayes models.

Recall that the essence of a statistical analysis is the caution to notgo beyond the sometimes limited evidence to support inferentialconclusions drawn. In the case of the present invention, this relates toBlock 1711 of FIG. 17 a where inferences about mastery versus nonmasteryare sometimes withheld for certain examinee/attribute combinations dueto lack of strong statistical evidence:

Requiring Strong Statistical Evidence to Make an Inference of Mastery orNonmastery (Block 1711 of FIG. 17 a) Referring back to the cognitiveexample of the statistics test, Susan might be inferred to have aposterior probability of mastery of histograms of 0.1 (Attribute 1),mastery probability of 0.53 for medians/quantiles (Attribute 2), masteryprobability of 0.81 for averages/means (Attribute 3), etc. The currentBayes UM cognitive diagnostic mastery assignment rule assigns masteryfor posterior probabilities above 0.65 and non-mastery for posteriorprobabilities below 0.35 and withholds mastery assignment otherwise (seeBlock 1711; this a convention that is certainly subject to change).Cutoff values of 0.8 and 0.2 are sometimes used when very strongevidence is demanded before assigning mastery or non-mastery.

Suppose the 0.35 and 0.65 cutoff values are applied. Then, becauseSusan's posterior probability of 0.81 is greater than 0.65, Susan isjudged to have mastered histograms, because 0.1 is less than 0.35 Susanis judged to have not mastered averages/means, and because 0.53 is abovethe cutoff for non-mastery and below the cutoff for mastery, judgment iswithheld for medians/quantiles mastery. This capability to withholdassignment when the amount of information in the data is not sufficientto provide strong evidence of attribute mastery or non-mastery is a realstrength of the UM statistical method.

A Computer Simulation Study of UMCD Applied to Test Data Using TileCognitive Structure From the Introductory Statistics Exam of Example 2

The purpose here is twofold. First, it is desired to further lay out themajor steps of the use of the current UMCD so as to make explicit howthe procedure is carried out. Second evidence of the effectiveness ofthe present invention in achieving a cognitive diagnosis is given.

A computer simulation study is constructed demonstrating the power ofthe use of the current UMCD to cognitively diagnose student attributemastery based upon the introductory statistics exam, as referred toearlier in Example 2 (refer also to FIG. 19 for the specificitem/attribute structure). This simulation is described by following theflow chart of FIG. 17 a.

A computer was programmed to generate data using the cognitive structurefrom the exam. FIG. 18 gives a sample set of questions (items) 9-18 ofthis 40 question exam (Block 203 of FIG. 17 a).

The eight attributes described earlier were chosen (Block 201).Theattribute/item structure is given in the table of the item/attributeincidence matrix given in FIG. 19 (Block 205). The user developed thismatrix, in this case the patent applicants.

The eight statistics knowledge attributes from Example 2 should berecalled: (1) histogram, (2) median/quartile, (3) average/mean, (4)standard deviation, (5) regression prediction, (6) correlation, (7)regression line, and (8) regression fit. For example, Item 17 aboverequires attributes (1), (3), and (4). It is noted, as in the case inthis simulation example, that in a typical application of the UMCD theuser will construct the test questions and decide on the majorattributes to be diagnosed (perhaps selecting the attributes first andthen developing questions designed to diagnose these attributes) andhence made part of a. Referring to this item/attribute table of FIG. 19,in order to simulate data positivity and completeness, parameters weregenerated for the 40 items that allow for slight to moderateincompleteness and slight to moderate non-positivity, but in generalreflect a test that has a highly cognitive structure, and simulatedexaminee response data was created (that is, for each of the 500simulated examinees, a string of 40 0s and 1s was simulated, indicatingwhich items are gotten right and which wrong). “Slight to moderateincompleteness” means the probability of whether or not an examinee getsan item correct is mostly based on which of the eight specifiedattributes the examinee possesses and lacks that are relevant to thatitem. The slight to moderate incompleteness in the simulated data wasachieved by spreading the c values between 1.5 and 2.5 fairly uniformly.The (perhaps many) other attributes influencing performance on the itemsare assumed to have only a minor influence.

“Slight to moderate non-positivity” means examinees lacking any of anitem's required attributes (from among the listed eight attributes) willlikely get the item wrong. The “slight to moderate non-positivity” wasachieved by having the r*'s fairly uniform between 0 and 0. 4 and havingthe π*'s fairly uniform between 0.7 and 1. Noting that incompleteness isalso slight to moderate as just discussed, it can be seen that anexaminee possessing all the item's required attributes will likely getthe item right. Also, an examinee lacking at least one requiredattribute will likely get the item wrong.

The abilities θ and attributes a for 500 simulated examinees weregenerated with each attribute having a mastery rate of 50% and with theresidual 0 abilities distributed according to a standard normaldistribution. Further, the correlations between attribute pairs andbetween (α, θ) pairs were assumed to be around 0.3, as was judged to berealistic. For example, Examinee 1 might be simulated to have αa=(0 1 11 0 1 1 1), amounting to mastery on six of the eight major attributes.

Then, for each examinee and each item, the simulation in effect flips acoin weighted by his/her predicted probability of correctly respondingto the item according to the UM of Equations 1, 2, 5, and 6. A samplesize of 500 taking the test (Block 207) was simulated because that isthe approximate size of (or even smaller than) a typical largeintroductory statistics course at a large university in a semester. Itis also a reasonable size for all the students taking a core course(like Algebra II) within a fairly large school district.

The goal of this study is to observe how effective the UMCD is inrecovering the known cognitive abilities of the examinees (the cognitiveabilities are known, recall, because they were generated using a knownsimulation model fed to the computer). In order to determine howeffective a statistical method such as the UMCD is, assessing themethod's effectiveness in a realistic computer simulation is one of thefundamental ways statisticians proceed. Indeed, the fact that thesimulation model, and hence its parameters generating the data, is knownis very useful in using simulation studies to evaluate the effectivenessof a statistical procedure.

Blocks 205, 1701, 1703, and 1705 of FIG. 17 a constitute the assumedBayes model, as given by Formulas 1, 2, 5, and 6. The simulated examineeresponse data (a matrix of 0s and 1s of dimension 500 by 40 (Block 207)was analyzed using MCMC (Block 1707) according to the identifiable BayesUM schematically given in FIG. 16. For each examinee attributecombination a chain of length 15,000 was generated, with the first 5000values discarded to avoid any potential influence of the starting valuesof the chain (Block 1707). According to the MCMC theory, this chain of10000 values estimates the desired posterior distribution of attributemastery for each examinee. For example if Examinee 23 for Attribute 4has 8500 1s and 1500 0s, then the simulation data based posteriorprobability of Examinee 23 mastering Attribute 4 becomes 8500/10000=0.85(Block 1709). According to the procedure an examinee was declared amaster of an attribute if the posterior probability was greater than0.65 and a non-master if the posterior probability was less than 0.35(Block 1711). These mastery/non-mastery settings may be modified in thepractice of the present invention.

The procedure performed extremely effectively, correctly diagnosingattribute mastery versus non-mastery in 96.1 % of the examinee/attributecombinations (8 attributes times 500 examinees is 4000examinee/attribute combinations minus the 176 attribute/examineecombinations where a diagnosis was withheld because of weak evidence,when the posterior probability was between 0.35 and 0.65). Consideringthat a modest length test with 40 multiple-choice items with respect to8 attributes is used, it is impressive that the cognitive diagnosis wasso accurate. In fact, if stronger evidence was demanded by using 0.8 and0.2 as cutoff values, the correct diagnosis rate increases to 97.6%, butdiagnosis is withheld for 456 attribute/examinees combinations. This isstrong scientific evidence that the procedure is effective as acognitive diagnostic tool.

The item parameters were also well estimated (calibrated). The averagedifference between the estimated and true π* and the estimated and truer* values is 0.03 (the range for both parameter types is from 0 to 1),and the average difference between the estimated and true c is 0.3 (therange is between 0 and 3). As expected, the values of c were not as wellestimated as the π* values and r* values were estimated because the examwas designed to have a highly cognitive structure (that is, relativelypositive and complete) and was designed to test a group of examineesmodeled to understand the attributes well (i.e. many of them are mastersand hence can be expected to have relatively high θ values). Althoughthe model is parametrically complex, it is possible to estimate the keyparameters well and hence calibrate the model well. Because of this,there is no risk of being hurt by the variance/bias trade-off, asrepresented above in the example of data that truly follow a fourparameter cubic polynomial model. In that case either the situationcould be misrepresented by computing a reliable estimate of the oneparameter in the biased linear model, or the situation could bemisrepresented by computing unreliable estimates of the four parametersin the unbiased cubic polynomial model. By contrast, here in the UMCDsimulation, the parameters of the complex and well-fitting UM areestimated well.

The constructs of positivity and completeness as expressed throughidentifiable and easily interpretable parameters are intuitively easyfor the educational practitioner to grasp. Moreover, these constructsprovide the practitioner with a realistic yet tractable way of modelingthe inherent randomness of attribute based examinee responding. Further,the introduction of the latent variable θ to handle incompletenessprovides the educational practitioner enormous freedom in selectingwhich and, in particular, how many attributes to explicitly include inthe UM-based cognitive model. Finally, allowing the user explicitcontrol over attribute mastery levels is important, as is the positiveattribute correlational structure assumed in the Bayes portion of theUM. In fact, the realization that one should choose a Bayesian modelthat in particular presumes positively associated attributes through anappropriately chosen prior on the attributes solved a major practicalproblem that existed for implementing the 1995 UM, namely its failure totake advantage of the fact that attributes are always positivelycorrelated, a fact very useful (when used!) in achieving high accuracywhen doing cognitive diagnoses. Indeed, simulation studies showed thatBayes UMs with the positive correlational structure between attributesincorporated performed dramatically better than Bayes UMs without suchpositive correlational structure. Just to be clear, one majorcontribution incorporated in the current version of the UM diagnosticapproach is the realization that a probability modeling structure shouldbe built that achieves positively correlated attributes, and that takinga Bayes probability modeling approach is an excellent way to do it.

In a real data test/retest PSAT setting studied under a grant from theEducational Testing Service, the UMCD approach managed to consistentlyclassify over ⅔ of the examinees according to attributemastery/nonmastery across the two tests (both tests assign attributemastery or both tests assign failure to master an attribute). This isparticularly impressive because the PSAT is a test that by its verydesign is weak in its providing of cognitive information about specificattributes.

There are several reasons that UMCD is distinguished from and surpassesthese other approaches in cognitive diagnostic performance. As alreadyexplained, the other approaches use different models than the Bayes UMapproach does. Further, the UMCD is the only model that issimultaneously statistically tractable, contains identifiable modelparameters that are capable of both providing a good model fit of thedata and being easily interpreted by the user as having meaningfulcognitive interpretations, specifies attribute mastery levels,incorporates into its cognitive diagnosis the positive association ofattributes in the data, and is flexible both in terms of allowingvarious cognitive science perspectives and in incorporating predictedexaminee error to produce suitable cognitive inference caution. Theother models can be unrealistic (because of their adherence to aparticular cognitive modeling approach) in settings where the approachprovides a poor description of the actual cognitive reality. They areoften difficult to interpret because they have parameters that are noteasily interpreted by users and hence are not be easily understood,especially by the typical educational practitioner. Moreover, many suchmodels do not seem to fit the data particularly well, an absolutenecessity for a statistical procedure to work effectively. And, none ofthem address the fundamental concept of specifying attribute mastery.

Applying the UM Approach of the Present Invention to Medical/PsychiatricDiagnosis

Medical diagnostic models are useful for aiding the practitioner incoming up with diagnoses consisting of a list of possible disorders thata medical practitioner compiles based on the symptoms presented by apatient, but they are not a replacement for the practitioner. Thus, agood system will give a reasonably complete list of the probabledisorders, although with enough patient information the number ofdisorders should be manageable.

FIG. 17 b is a flow chart of the UM medical/psychiatric diagnosticprocedure used in the present invention. It should be compared with theFIG. 17 a flow chart that gives the analogous UM procedure for cognitivediagnosis. The set of potential disorders replaces the set of attributes(Block 201′), and the set of symptoms and other patient characteristicsconsisting of such things as dichotomized laboratory test values, age,race, sex, etc., replaces the items (Block 203′). θ is then a latenthealth or latent quality of life variable that combines all latenthealth variables and quality of life variables that are not potentialdisorders explicitly listed in the model. Then the UM is applied inexactly the same way that it is applied in the educational diagnosticsetting (FIG. 17 a). Specifically, symptoms/characteristics anddisorders are defined (Blocks 201′ and 203′), and then an incidencematrix is constructed to indicate which disorders may be related to thepresence a particular symptom/characteristic (Block 205′). The itemparameters of co (as used in Blocks 1701, 1703, 1705, 1707′) are nowsymptom/characteristic parameters, and they can actually be accuratelyestimated if the data set used (Block 207′) to calibrate the modelincludes patients with known disorders. This would improve the accuracyof the symptom/characteristic parameter calibration (Block 1707′). Aparticular patient can then be assigned a list of disorders that he/shehas a high enough probability of having (Block 1711′), based on theposterior probabilities calculated from the UM estimation program. Thereport to a practitioner of the potential diagnoses may include theposterior probabilities assigned to each disorder (Block 1709′). Thestatistical analyses proceed similarly in both settings (Blocks 1701,1703,1705, 1707′, 1709′, 1711′). The diagnosis is then used support thepractitioners' diagnostic efforts (Block 1713′).

One thing that differs between this situation and the educationalmeasurement situation (except in psychiatry) is that there exist “goldstandard” diagnoses for most disorders. Thus, the“symptom/characteristic calibration” can be done using patients thathave known, and hence not latent, disorders.

Applying the UM of the Present Invention in Novel Settings other thanEducational or Medicat/Psychiatric FIG. 17 c presents the flow chart ofthe present invention applied in a generic setting. FIG. 17 c should becompared with the cognitive diagnostic flow chart of the present UMCDinvention of FIG. 17 a applied in educational settings. The followingcorrespondences are required: Attributes Properties (Blocks 201″, 205″,1709″, 1711′) Test Items Probes (Blocks 203″, 205″, 207″ 1707″)Item/attribute Probe/property (Block 205″) incidence matrix incidencematrix Cognitive Latent diagnosis (Block 1711″) diagnosis

The statistical analyses proceed similarly in both settings (Blocks1701, 1703, 1705, 1707″, 1709″, 1711″). Because the setting is generic,all that can be said about its application is that the latent diagnosticresults would be used to make inferences and possibly decisions aboutthe real world setting in which the present invention is used.

A Semi-qualitative Description of the General Structure of the Equationsand Relationships Undergirding the Present Invention Equations 1,2,5,and6 and the definitions of π*, r*, c, a, and θ are used to help explainthe portions of the specific embodiment of the invention. The presentinvention is flowcharted in FIGS. 17 a, 17 b, and 17 c, each flow chartfor a different application. The terminology of cognitive diagnosis(FIG. 17 a) will here be used for convenience, noting that theterminology of medical and psychiatric diagnosis (FIG. 17 b) or theterminology of generic diagnosis (FIG. 17 c) would function identically.

It is useful to describe to describe via an intermediate non-equationspecified representation the essential components of the presentinvention. Equations 1,5, and 6 together with their identifiable andhence able to be calibrated parameters r*'s and π*'s provide oneexplication of the fact that (i) the probability of getting an itemcorrect is increased by examinee mastery of all the attributes neededfor the item as contrasted with lacking one or more needed attributes.Further, (ii) the more needed attributes that are not mastered the lowerthe probability of getting the item correct. The clauses (i) and (ii)above qualitatively describe the concept of positivity of an item, whichis expressed in one specific manner in the embodiment of the presentinvention. In general any set of model equations may be used to capturethe notion of positivity in a UM used in the present invention providedthe parameters of the equations are identifiable, substantivelymeaningful to the practitioner, and express both (i) and (ii) statedabove or express (i) alone.

Modeling completeness for the UM is characterized by using one or a lownumber of latent variables to capture the affect on the probability ofgetting an item correct caused by all influential attributes notexplicitly listed in the model via the incidence matrix (Blocks 205,205′ and 205″). Any expression other than P(θ_(j)+c_(i)) of the presentinvention that expresses the fact that the attributes other than thoseexplicitly listed in the UM incidence matrix can influence theprobability of getting an item correct and that captures this influenceparsimoniously with one or a small number of latent variables is anacceptable way to model UM completeness. The current embodimentspecifies attribute mastery levels by setting the values of parametersp_(k) as shown in the schematic of FIG. 16, noting that the currentapproach to setting mastery is tied to the Bayesian modeling approach ofthe present invention. However, any way of quantifying the user of anattribute based cognitive procedure setting attribute mastery levelssuffices.

Further, any way of modeling associations between attributes suffices;this does not have to be done in a Bayesian framework using the σ_(kk),of FIG. 16.

Further, one could express the fact that each item requires certainattributes for its successful solution in other ways than an 0/1incidence matrix (as done currently: see FIG. 19).

Thus, in summary, any ways of explicating the need for identifiableparameters expressing positivity and completeness, specifying attributemastery levels, building into the model that attributes tend to beassociated either positively in the educational settings or perhapspositively and/or negatively in other settings, and expressing thedependence on each item of a subset of the specified attributes providesa way of expressing aspects of the UMCD being claimed.

While a preferred application of the present invention is to use the UM,it should be understood that features of the present invention have nonUM-based applications to diagnostic modeling and diagnostic procedures.Specifically, any model concerning objects, usually people, with twovalued latent properties such as attributes or disorders may utilize thespecifying of the level of possession of each property such asspecifying the level of mastery or specifying the level of disorderjudged to constitute a person having the disorder and further mayutilize modeling a positive or negative association between propertiessuch as attributes or disorders thus allowing the calibration andsubsequent use of the estimated sizes of the associations to improveaccuracy when carrying out diagnoses.

All of the above referenced publications are incorporated herein byreference in their entirety.

1. A method comprising: constructing a test comprising test items andselecting a set of attributes designed to measure proficiency ofexaminees taking the test and that each examinee has or has not achievedmastery thereof; creating a mathematically expressed model comprisingthe test items and the selected attributes, the selected attributesbeing a subset of a larger group of attributes influencing examinee testitem performance with an unspecified remainder of the larger group ofattributes being accounted for in the model by a residual abilityparameter, the model including parameters describing how the test itemsdepend on the selected set of attributes and how the test items alsodepend on the residual ability parameter in such a manner that examineeresponses to test items provide estimation information about eachparameter permitting calibration thereof and provide predictions ofwhich attributes the examinees have or have not achieved masterythereof, the model further accounting for a probability that eachexaminee for each individual test item may achieve mastery of all theattributes from the subset of the selected set of attributes requiredfor the individual test item but fail to apply at least one required andmastered attribute correctly to the individual test item therebyresponding to the test item incorrectly and that each examinee for eachindividual test item may have failed to achieve mastery of at least onerequired specified attribute for the item and nevertheless apply eachrequired specified attribute for which mastery was not achievedcorrectly to the item and also applying the remaining required andmastered attributes from the selected set of attributes correctly to theitem thereby responding to the test item correctly, the model definingmastery of each attribute to be an assigned level representing that anexaminee exceeding the level has acquired attribute competence thereof,and the model expressing for pairs of the set of selected attributes apositive association between the two members of each of the pairs andfurther expressing a size measure of the positive association of eachpair of attributes that can be estimated for each pair from the examineeresponses to individual test items; and applying test results obtainedfrom responses of the examinees to calibrate the individual test itemsof the model and to generate a prediction of attribute mastery, aprediction of failure to achieve mastery, or a withholding of anyprediction for each individual examinee and individual specifiedattribute combination.
 2. A method in accordance with claim 1comprising: constructing a test comprising test items, X_(ij)=0 or 1according as Examinee j gets Item i wrong or right respectively, andselecting a set of attributes {α_(jk)} with α_(jk)=0 or 1 according asExaminee j has failed to master or has mastered Attribute k,respectively; and creating a mathematically expressed model thatincludes identifiable and hence capable of being calibrated parameters{π*, r*} describing how the test items depend on the selected set ofattributes according to the following probability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of applying all therequired attributes correctly as determined by examinee mastery andnonmastery of these required attributes, the product of r*'s in S_(ij)over the m attributes required for Item i as specified by anitem/attribute incidence matrix, π*_(i)=Π(π_(ik)) with the product overk, r*_(ik)=r_(ik)/π_(ik), with r_(ik)=Prob(Attribute k applied correctlyto item i given that the examinee has not mastered Attribute k),π_(ik)=Prob (Attribute k applied correctly to Item i given that theexaminee has mastered Attribute k), expressing the size measure of thepositive association of each pair of attributes indirectly as determinedby the correlation σ_(kk), between continuous bivariate normal attributeprecursors α′_(k), α′_(k), to dichotomous attributes α_(k), α_(k′), theneeded dichotomous attribute pairs α_(k), α_(k), then produced bycutting each α′_(k) at a specified mastery level cutpoint such that theAttribute k mastery probability p_(k) that (α_(k)=1) is defined to equalProb(α′_(k)≧cut point) with the level of attribute mastery thus definedby selecting the cut point for α′_(k) with p_(k) being the userdetermined proportion of examinees judged to have mastered Attribute k,thereby each attribute through the continuous α′_(k) and its cut pointhaving an assigned level representing that an examinee exceeding thatlevel has acquired attribute mastery thereof.
 3. A method comprising:constructing a test comprising test items and selecting a set ofattributes designed to measure proficiency of examinees taking the testand that each examinee has or has not achieved mastery thereof; creatinga mathematically expressed model comprising the test items and theselected attributes, the selected attributes being a subset of a largergroup of attributes influencing examinee test item performance with anunspecified remainder of the larger group of attributes being accountedfor in the model by a residual ability parameter, the model includingparameters describing how the test items depend on the selected set ofattributes and how the test items also depend on the residual abilityparameter in such a manner that examinee responses to test items provideestimation information about each parameter permitting calibrationthereof and provide predictions of which attributes the examinees haveor have not achieved mastery thereof, the model further accounting for aprobability that each examinee for each individual test item may achievemastery of all the attributes from the subset of the selected set ofattributes required for the individual test item but fail to apply atleast one required and mastered attribute correctly to the individualtest item thereby responding to the test item incorrectly and that eachexaminee for each individual test item may have failed to achievemastery of at least one required specified attribute for the item andnevertheless apply each required specified attribute for which masterywas not achieved correctly to the item and also applying the remainingrequired and mastered attributes from the selected set of attributescorrectly to the item thereby responding to the test item correctly, andthe model expressing for pairs of the set of selected attributes apositive association between the two members of each of the pairs andfurther expressing a size measure of the positive association of eachpair of attributes that can be estimated for each pair from the examineeresponses to individual test items; and applying test results obtainedfrom responses of the examinees to calibrate the individual test itemsof the model and to generate a prediction of attribute mastery, aprediction of failure to achieve mastery, or a withholding of anyprediction for each individual examinee and individual specifiedattribute combination.
 4. A method in accordance with claim 3comprising: constructing a test comprising test items, X_(ij)=0 or 1according as Examinee j gets Item i wrong or right respectively, andselecting a set of attributes {α_(jk)} with α_(jk)=0 or 1 according asExaminee j has failed to master or has mastered Attribute k,respectively; and creating a mathematically expressed model thatincludes identifiable and hence capable of being calibrated parameters{π*, r*} describing how the test items depend on the selected set ofattributes according to the following probability:S _(ij)=(π_(i)*)×(r_(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of applying all therequired attributes correctly as determined by examinee mastery andnonmastery of these required attributes, the product of r*'s in S_(ij)over the m attributes required for Item i as specified by anitem/attribute incidence matrix, π*_(i)=Π(π_(ik)) with the product overk, r*_(ik)=r_(ik)/π_(ik), with r_(ik)=Prob(Attribute k applied correctlyto item i given that the examinee has not mastered Attribute k),π_(ik)=Prob (Attribute k applied correctly to Item i given that theexaminee has mastered Attribute k), expressing the size measure of thepositive association of each pair of attributes indirectly as determinedby the correlation σ_(kk), between continuous bivariate normal attributeprecursors α′_(k), α′_(k′) to dichotomous attributes α_(k), α_(k′) theneeded dichotomous attribute pairs α_(k), α_(k′) then produced bycutting each α′_(k) at a cutpoint.
 5. A method comprising: constructinga test comprising test items and selecting a set of attributes designedto measure proficiency of examinees taking the test and that eachexaminee has or has not achieved mastery thereof; creating amathematically expressed model comprising the test items and theselected attributes, the selected attributes being a subset of a largergroup of attributes influencing examinee test item performance with anunspecified remainder of the larger group of attributes being accountedfor in the model by a residual ability parameter, the model includingparameters describing how the test items depend on the selected set ofattributes and how the test items also depend on the residual abilityparameter in such a manner that examinee responses to test items provideestimation information about each parameter permitting calibrationthereof and provide predictions of which attributes the examinees haveor have not achieved mastery thereof, the model further accounting for aprobability that each examinee for each individual test item may achievemastery of all the attributes from the subset of the selected set ofattributes required for the individual test item but fail to apply atleast one required and mastered attribute correctly to the individualtest item thereby responding to the test item incorrectly and that eachexaminee for each individual test item may have failed to achievemastery of at least one required specified attribute for the item andnevertheless apply each required specified attribute for which masterywas not achieved correctly to the item and also applying the remainingrequired and mastered attributes from the selected set of attributescorrectly to the item thereby responding to the test item correctly, andthe model defining mastery of each attribute to be an assigned levelrepresenting that an examinee exceeding the level has acquired attributecompetence thereof, and applying test results obtained from responses ofthe examinees to calibrate the individual test items of the model and togenerate a prediction of attribute mastery, a prediction of failure toachieve mastery, or a withholding of any prediction for each individualexaminee and individual specified attribute combination.
 6. A method inaccordance with claim 5 comprising: constructing a test comprising testitems, X_(ij)=0 or 1 according as Examinee j gets Item i wrong or rightrespectively, and selecting a set of attributes {α_(jk)} with α_(jk)=0or 1 according as Examinee j has failed to master or has masteredAttribute k, respectively; and creating a mathematically expressed modelthat includes identifiable and hence capable of being calibratedparameters {π*, r*} describing how the test items depend on the selectedset of attributes according to the following probability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of applying all therequired attributes correctly as determined by examinee mastery andnonmastery of these required attributes, the product of r*'s in S_(ij)over the m attributes required for Item i as specified by anitem/attribute incidence matrix, π*_(i)=Π(π_(ik)) with the product overk, r*_(ik)=r_(ik)/π_(ik), with r_(ik)=Prob(Attribute k applied correctlyto item i given that the examinee has not mastered Attribute k),π_(ik)=Prob (Attribute k applied correctly to Item i given that theexaminee has mastered Attribute k), with p_(k) being a user determinedproportion of examinees judged to have mastered Attribute k.
 7. A methodcomprising: constructing a test comprising test items and selecting aset of attributes designed to measure proficiency of examinees takingthe test and that each examinee has or has not achieved mastery thereof;creating a mathematically expressed model comprising the test items andthe selected attributes, the selected attributes being a subset of alarger group of attributes influencing examinee test item performancewith an unspecified remainder of the larger group of attributes beingaccounted for in the model by a residual ability parameter, the modelincluding parameters describing how the test items depend on theselected set of attributes and how the test items also depend on theresidual ability parameter and provide predictions of which attributesthe examinees have or have not achieved mastery thereof, the modelfurther accounting for a probability that each examinee for eachindividual test item may achieve mastery of all the attributes from thesubset of the selected set of attributes required for the individualtest item but fail to apply at least one required and mastered attributecorrectly to the individual test item thereby responding to the testitem incorrectly and that each examinee for each individual test itemmay have failed to achieve mastery of at least one required specifiedattribute for the item and nevertheless apply each required specifiedattribute for which mastery was not achieved correctly to the item andalso applying the remaining required and mastered attributes from theselected set of attributes correctly to the item thereby responding tothe test item correctly, the model defining mastery of each attribute tobe an assigned level representing that an examinee exceeding the levelhas acquired attribute competence thereof, and the model expressing forpairs of the set of selected attributes a positive association betweenthe two members of each of the pairs and further expressing a sizemeasure of the positive association of each pair of attributes that canbe estimated for each pair from the examinee responses to individualtest items; and applying test results obtained from responses of theexaminees to calibrate the individual test items of the model and togenerate a prediction of attribute mastery, a prediction of failure toachieve mastery, or a withholding of any prediction for each individualexaminee and individual specified attribute combination.
 8. A method inaccordance with claim 7 comprising: constructing a test comprising testitems, X_(ij)=0 or 1 according as Examinee j gets Item i wrong or rightrespectively, and selecting a set of attributes {α_(jk)} with α_(jk)=0or 1 according as Examinee j has failed to master or has masteredAttribute k, respectively; and creating a mathematically expressed modelthat includes parameters {π*, r*} describing how the test items dependon the selected set of attributes according to the followingprobability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of applying all therequired attributes correctly as determined by examinee mastery andnonmastery of these required attributes, the product of r*'s in S_(ij)over the m attributes required for Item i as specified by anitem/attribute incidence matrix, π*_(i)=Π(π_(ik)) with the product overk, r*_(ik)=r_(ik)/π_(ik), with r_(ik)=Prob(Attribute k applied correctlyto item i given that the examinee has not mastered Attribute k),π_(ik)=Prob (Attribute k applied correctly to Item i given that theexaminee has mastered Attribute k), expressing the size measure of thepositive association of each pair of attributes indirectly as determinedby the correlation σ_(kk), between continuous bivariate normal attributeprecursors α_(k), α_(k′) to dichotomous attributes α_(k), α_(k′), theneeded dichotomous attribute pairs α_(k), α_(k′) then produced bycutting each α′_(k) at a specified mastery level cutpoint such that theAttribute k mastery probability p_(k) that (α_(k)=1) is defined to equalProb(α′_(k)≧cut point) with the level of attribute mastery thus definedby selecting the cut point for α′_(k) with p_(k) being the userdetermined proportion of examinees judged to have mastered Attribute k,thereby each attribute through the continuous α′_(k) and its cut pointhaving an assigned level representing that an examinee exceeding thatlevel has acquired attribute mastery thereof.
 9. A method comprising:constructing a test comprising test items and selecting a set ofattributes designed to measure proficiency of examinees taking the testand that each examinee has or has not achieved mastery thereof; creatinga mathematically expressed model comprising the test items and theselected attributes, the selected attributes being a subset of a largergroup of attributes influencing examinee test item performance with anunspecified remainder of the larger group of attributes being accountedfor in the model by a residual ability parameter, the model includingparameters describing how the test items depend on the selected set ofattributes and how the test items also depend on the residual abilityparameter in such a manner that examinee responses to test items provideestimation information about each parameter permitting calibrationthereof and provide predictions of which attributes the examinees haveor have not achieved mastery thereof, the model further accounting for aprobability that each examinee for each individual test item may achievemastery of all the attributes from the subset of the selected set ofattributes required for the individual test item but fail to apply atleast one required and mastered attribute correctly to the individualtest item thereby responding to the test item incorrectly and that eachexaminee for each individual test item may have failed to achievemastery of at least one required specified attribute for the item andnevertheless apply each required specified attribute for which masterywas not achieved correctly to the item and also applying the remainingrequired and mastered attributes from the selected set of attributescorrectly to the item thereby responding to the test item correctly;applying test results obtained from responses of the examinees tocalibrate the individual test items of the model and to generate aprediction of attribute mastery, a prediction of failure to achievemastery, or a withholding of any prediction for each individual examineeand individual specified attribute combination.
 10. A method inaccordance with claim 9 comprising: constructing a test comprising testitems, X_(ij)=0 or 1 according as Examinee j gets Item i wrong or rightrespectively, and selecting a set of attributes {α_(jk)} with α_(jk)=0or 1 according as Examinee j has failed to master or has masteredAttribute k, respectively; and creating a mathematically expressed modelthat includes identifiable and hence capable of being calibratedparameters {(π*, r*} describing how the test items depend on theselected set of attributes according to the following probability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of applying all therequired attributes correctly as determined by examinee mastery andnonmastery of these required attributes, the product of r*'s in S_(ij)over the m attributes required for Item i as specified by anitem/attribute incidence matrix, π*_(i)=Π(π_(ik)) with the product overk, r*_(ik)=r_(ik)/π_(ik), with r_(ik)=Prob(Attribute k applied correctlyto item i given that the examinee has not mastered Attribute k),π_(ik)=Prob (Attribute k applied correctly to Item i given that theexaminee has mastered Attribute k).
 11. A method comprising:constructing a test comprising test items and selecting a set ofattributes designed to measure proficiency of examinees taking the testand that each examinee has or has not achieved mastery thereof; creatinga mathematically expressed model comprising the test items and theselected attributes, the selected attributes being a subset of a largergroup of attributes influencing examinee test item performance with anunspecified remainder of the larger group of attributes being accountedfor in the model by a residual ability parameter, the model includingparameters describing how the test items depend on the selected set ofattributes and how the test items also depend on the residual abilityparameter and provide predictions of which attributes the examinees haveor have not achieved mastery thereof, the model further accounting for aprobability that each examinee for each individual test item may achievemastery of all the attributes from the subset of the selected set ofattributes required for the individual test item but fail to apply atleast one required and mastered attribute correctly to the individualtest item thereby responding to the test item incorrectly and that eachexaminee for each individual test item may have failed to achievemastery of at least one required specified attribute for the item andnevertheless apply each required specified attribute for which masterywas not achieved correctly to the item and also applying the remainingrequired and mastered attributes from the selected set of attributescorrectly to the item thereby responding to the test item correctly, andthe model defining mastery of each attribute to be an assigned levelrepresenting that an examinee exceeding the level has acquired attributecompetence thereof; and applying test results obtained from responses ofthe examinees to calibrate the individual test items of the model and togenerate a prediction of attribute mastery, a prediction of failure toachieve mastery, or a withholding of any prediction for each individualexaminee and individual specified attribute combination.
 12. A method inaccordance with claim 11 comprising: constructing a test comprising testitems, X_(ij)=0 or 1 according as Examinee j gets Item i wrong or rightrespectively, and selecting a set of attributes {α_(jk)} with α_(jk)=0or 1 according as Examinee j has failed to master or has masteredAttribute k, respectively; and creating a mathematically expressed modelthat includes parameters {π*, r*} describing how the test items dependon the selected set of attributes according to the followingprobability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im*)) ^(ajm) with S_(ij) being the probability of applying all therequired attributes correctly as determined by examinee mastery andnonmastery of these required attributes, the product of r*'s in S_(ij)over the m attributes required for Item i as specified by anitem/attribute incidence matrix, π*_(i)=Π(π_(ik)) with the product overk, r*_(ik)=r_(ik)/π_(ik), with r_(ik)=Prob(Attribute k applied correctlyto item i given that the examinee has not mastered Attribute k),π_(ik)=Prob (Attribute k applied correctly to Item i given that theexaminee has mastered Attribute k), with p_(k) being a user determinedproportion of examinees judged to have mastered Attribute k.
 13. Amethod comprising: constructing a test comprising test items andselecting a set of attributes designed to measure proficiency ofexaminees taking the test and that each examinee has or has not achievedmastery thereof; creating a mathematically expressed model comprisingthe test items and the selected attributes, the selected attributesbeing a subset of a larger group of attributes influencing examinee testitem performance with an unspecified remainder of the larger group ofattributes being accounted for in the model by a residual abilityparameter, the model including parameters describing how the test itemsdepend on the selected set of attributes and how the test items alsodepend on the residual ability and provide predictions of whichattributes the examinees have or have not achieved mastery thereof, themodel further accounting for a probability that each examinee for eachindividual test item may achieve mastery of all the attributes from thesubset of the selected set of attributes required for the individualtest item but fail to apply at least one required and mastered attributecorrectly to the individual test item thereby responding to the testitem incorrectly and that each examinee for each individual test itemmay have failed to achieve mastery of at least one required specifiedattribute for the item and nevertheless apply each required specifiedattribute for which mastery was not achieved correctly to the item andalso applying the remaining required and mastered attributes from theselected set of attributes correctly to the item thereby responding tothe test item correctly, and the model expressing for pairs of the setof selected attributes a positive association between the two members ofeach of the pairs and further expressing a size measure of the positiveassociation of each pair of attributes that can be estimated for eachpair from the examinee responses to individual test items; applying testresults obtained from responses of the examinees to calibrate theindividual test items of the model and to generate a prediction ofattribute mastery, a prediction of failure to achieve mastery, or awithholding of any prediction for each individual examinee andindividual specified attribute combination.
 14. A method in accordancewith claim 13 comprising: constructing a test comprising test items,X_(ij)=0 or 1 according as Examinee j gets Item i wrong or rightrespectively, and selecting a set of attributes {α_(ik)} with α_(jk)=0or 1 according as Examinee j has failed to master or has masteredAttribute k, respectively; and creating a mathematically expressed modelthat parameters {π*, r*} describing how the test items depend on theselected set of attributes according to the following probability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of applying all therequired attributes correctly as determined by examinee mastery andnonmastery of these required attributes, the product of r*'s in S_(ij)over the m attributes required for Item i as specified by anitem/attribute incidence matrix, π*_(i)=Π(π_(ik)) with the product overk, r*_(ik)=r_(ik)/π_(ik), with r_(ik)=Prob(Attribute k applied correctlyto item i given that the examinee has not mastered Attribute k),π_(ik)=Prob (Attribute k applied correctly to Item i given that theexaminee has mastered Attribute k), expressing the size measure of thepositive association of each pair of attributes indirectly as determinedby the correlation σ_(kk), between continuous bivariate normal attributeprecursors α′_(k), α′_(k), to dichotomous attributes α_(k), α_(k′), theneeded dichotomous attribute pairs α_(k), α_(k′) then produced bycutting each α′_(k) at a specified cutpoint.
 15. A method comprisingconstructing a set of medically or psychiatrically focuseddichotomizable patient symptoms or dichotomizable personalcharacteristics, and selecting for evaluation a specified set ofpossible medical or psychiatric disorders that each patient has or doesnot have, with multiple disorders per patient being possible; creating amathematically expressed model comprising the symptoms orcharacteristics and the specified disorders selected for evaluation,with a latent health or quality of life parameter representing latentaspects of the patient not included in the specified set of disorders,the model including parameters describing how the symptoms orcharacteristics depend on the specified set of disorders and how thesymptoms or characteristics also depend on a latent general health orquality of life parameter in such a manner that the patient symptoms orcharacteristics provide estimation information about each parameterpermitting calibration thereof and predictions of the likelihood of thepossible disorders, the model further accounting for a probability thata patient may possess a set of symptoms or characteristicsrepresentative of a disorder and yet the patient does not have thedisorder and a patient may lack at least one of the symptoms orcharacteristics typical of the disorder and yet the patient has thedisorder, for some psychiatric and medical disorders the model defininga level judged as constituting having the disorder for each of the somepsychiatric and medical disorders, and the model expressing for somepairs among all pairs of the selected set of disorders an association,either positive or negative, of each of the same pairs and furtherexpressing a size measure of the association of each pair of theselected set of disorders that can be estimated from the patientresponses to the symptoms or characteristics; and using the model towhich patient data is applied to generate predictions of probabilitiesof patients having each of the disorders in the specified set ofpossible disorders.
 16. A method in accordance with claim 15 comprising:constructing a medical or psychiatric diagnosis comprising observedsymptoms/characteristics, X_(ij)=0 or 1 according as Patient j does notor does display Symptom/characteristic i respectively, and selecting aset of possible disorders {α_(jk)} with α_(jk)=0 or 1 according asPatient j does or does not have Disorder k, respectively; and creating amathematically expressed model that includes identifiable and hencecapable of being calibrated parameters {π*, r*} describing how thesymptoms/characteristics depend on the selected set of disordersaccording to the following probability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of displayingSymptom/characteristic i as determined by the disorders Patient j hasand does not have and assuming completeness with respect to the latenthealth/quality of life variable, the product of r*'s over the mdisorders associated with Symptom/characteristic i as specified by theincidence matrix, π*_(i)=Π(π_(ik)) with the product over k,r*_(ik)=r_(ik)/π_(ik), with r_(ik)=Prob(Symptom/characteristic i giventhat the patient has Disorder k), π_(ik)=Prob (Symptom/characteristic igiven the patient does not have Disorder k), expressing the size measureof the association of each pair of disorders indirectly as determined bythe correlation σ_(kk), between continuous bivariate normal disorderprecursors α′_(k), α′_(k′) to dichotomous disorders α_(k), α_(k′), theneeded dichotomous disorder pairs α_(k), α_(k′) then produced by cuttingeach α′_(k) at a specified disorder cutpoint such that the Disorder kpossession probability p_(k) is defined to equal Prob(α′_(k)≧cut point)with the level α′_(k) of the disorder judged to constitute having thedisorder (α_(k)=0) thus defined by selecting the cut point for α′_(k)with p_(k) being the proportion of the patients judged to have thedisorder as determined by the setting of the α′_(k) cutpoint, therebyeach disorder through the continuous α′_(k) and its cut point having anassigned level representing that a patient exceeding that level has thedisorder.
 17. A method comprising constructing a set of medically orpsychiatrically focused dichotomizable patient symptoms ordichotomizable personal characteristics, and selecting for evaluation aspecified set of possible medical or psychiatric disorders that eachpatient has or does not have, with multiple disorders per patient beingpossible; creating a mathematically expressed model comprising thesymptoms or characteristics and the specified disorders selected forevaluation, with a latent health or quality-of-life parameterrepresenting latent aspects of the patient not included in the specifiedset of disorders, the model including parameters describing how thesymptoms or characteristics depend on the specified set of disorders andhow the symptoms or characteristics also depend on a latent generalhealth or quality of life parameter in such a manner that the patientsymptoms or characteristics provide estimation information about eachparameter permitting calibration thereof and predictions of thelikelihood of the possible disorders, the model further accounting for aprobability that a patient may possess a set of symptoms orcharacteristics representative of a disorder and yet the patient doeshave the disorder and a patient may lack at least one of the symptoms orcharacteristics typical of the disorder and yet the patient has thedisorder and the model expressing for some pairs among all pairs of theselected set of disorders an association, either positive or negative,of each of the same pairs and further expressing a size measure of theassociation of each pair of the selected set of disorders that can beestimated from the patient responses to the symptoms or characteristics;and using the model to which patient data is applied to generatepredictions of probabilities of patients having each of the disorders inthe specified set of possible disorders.
 18. A method in accordance withclaim 17 comprising: constructing a medical or psychiatric diagnosiscomprising observed symptoms/characteristics, X_(ij)=0 or 1 according asPatient j does not or does display Symptom/characteristic irespectively, and selecting a set of possible disorders {α_(jk)} withα_(jk)=0 or 1 according as Patient j does or does not have Disorder k,respectively; and creating a mathematically expressed model thatincludes identifiable and hence capable of being calibrated parameters{π*, r*} describing how the symptoms/characteristics depend on theselected set of disorders according to the following probability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of displayingSymptom/characteristic i as determined by the disorders Patient j hasand does not have and assuming completeness with respect to the latenthealth/quality of life variable, the product of r*'s over the mdisorders associated with Symptom/characteristic i as specified by theincidence matrix, π*_(i)=Π(π_(ik)) with the product over k,r*_(ik)=r_(ik)/π_(ik), with r_(ik)=Prob(Symptom/characteristic i giventhat the patient has Disorder k), π_(ik)Prob (Symptom/characteristic igiven the patient does not have Disorder k), expressing the size measureof the association of each pair of disorders indirectly as determined bythe correlation σ_(kk), between continuous bivariate normal disorderprecursors α′_(k), α′_(k), to dichotomous disorders α_(k), α_(k′) , theneeded dichotomous disorder pairs α_(k), α_(k′) then produced by cuttingeach α′_(k) at a cutpoint.
 19. A method comprising constructing a set ofmedically or psychiatrically focused dichotomizable patient symptoms ordichotomizable personal characteristics, and selecting for evaluation aspecified set of possible medical or psychiatric disorders that eachpatient has or does not have, with multiple disorders per patient beingpossible; creating a mathematically expressed model comprising thesymptoms or characteristics and the specified disorders selected forevaluation, with a latent health or quality-of-life parameterrepresenting latent aspects of the patient not included in the specifiedset of disorders, the model including parameters describing how thesymptoms or characteristics depend on the specified set of disorders andhow the symptoms or characteristics also depend on a latent generalhealth or quality of life parameter in such a manner that the patientsymptoms or characteristics provide estimation information about eachparameter permitting calibration thereof and predictions of thelikelihood of the possible disorders, the model further accounting for aprobability that a patient may possess a set of symptoms orcharacteristics representative of a disorder and yet the patient doesnot have the disorder and a patient may lack at least one of thesymptoms or characteristics typical of the disorder and yet the patienthas the disorder, for some psychiatric and medical disorders the modeldefining a level judged as constituting having the disorder for each ofthe some psychiatric and medical disorders; and using the model to whichpatient data is applied to generate predictions of probabilities ofpatients having each of the disorders in the specified set of possibledisorders.
 20. A method in accordance with claim 19 comprising:constructing a medical or psychiatric diagnosis comprising observedsymptoms/characteristics, X_(ij)=0 or 1 according as Patient j does notor does display Symptom/characteristic i respectively, and selecting aset of possible disorders {α_(jk)} with α_(jk)=0 or 1 according asPatient j does or does not have Disorder k, respectively; and creating amathematically expressed model that includes identifiable and hencecapable of being calibrated parameters {π*, r*} describing how thesymptoms/characteristics depend on the selected set of disordersaccording to the following probability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of displayingSymptom/characteristic i as determined by the disorders Patient j hasand does not have and assuming completeness with respect to the latenthealth/quality of life variable, the product of r*'s over the mdisorders associated with Symptom/characteristic i as specified by theincidence matrix, π*_(i)=Π(π_(ik)) with the product over k,r*_(ik)=r_(ik)/π_(ik), with r_(ik)=Prob(Symptom/characteristic i giventhat the patient has Disorder k), π_(ik)=Prob (Symptom/characteristic igiven the patient does not have Disorder k), with p_(k) being theproportion of the patients judged to have the disorder as determined bythe user defining the disorder.
 21. A method comprising constructing aset of medically or psychiatrically focused dichotomizable patientsymptoms or dichotomizable personal characteristics, and selecting forevaluation a specified set of possible medical or psychiatric disordersthat each patient has or does not have, with multiple disorders perpatient being possible; creating a mathematically expressed modelcomprising the symptoms or characteristics and the specified disordersselected for evaluation, with a latent health or quality-of-lifeparameter representing latent aspects of the patient not included in thespecified set of disorders, the model including parameters describinghow the symptoms or characteristics depend on the specified set ofdisorders and how the symptoms or characteristics also depend on alatent general health or quality of life parameter, the model furtheraccounting for a probability that a patient may possess a set ofsymptoms or characteristics typical of a disorder and yet the patientdoes not have the disorder and a patient may lack at least one of thesymptoms or characteristics representative of the disorder and yet thepatient has the disorder, for some psychiatric and medical disorders themodel defining a level judged as constituting having the disorder foreach of the some psychiatric and medical disorders, and the modelexpressing for some pairs among all pairs of the selected set ofdisorders an association, either positive or negative, of each of thepairs and further expressing a size measure of the association of eachpair of the selected set of disorders that can be estimated from thepatient responses to the symptoms or characteristics; and using themodel to which patient data is applied to generate predictions ofprobabilities of patients having each of the disorders in the specifiedset of possible disorders.
 22. A method in accordance with claim 21comprising: constructing a medical or psychiatric diagnosis comprisingobserved symptoms/characteristics, X_(ij)=0 or 1 according as Patient jdoes not or does display Symptom/characteristic i respectively, andselecting a set of possible disorders {α_(jk)} with α_(jk)=0 or 1according as Patient j does or does not have Disorder k, respectively;and creating a mathematically expressed model that includes parameters{π*, r*} describing how the symptoms/characteristics depend on theselected set of disorders according to the following probability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of displayingSymptom/characteristic i as determined by the disorders Patient j hasand does not have and assuming completeness with respect to the latenthealth/quality of life variable, the product of r*'s over the mdisorders associated with Symptom/characteristic i as specified by theincidence matrix, π*_(i)=Π(π_(ik)) with the product over k,r*_(ik)=r_(ik)/π_(ik), with r_(ik)=Prob(Symptom/characteristic i giventhat the patient has Disorder k), π_(ik)=Prob (Symptom/characteristic igiven the patient does not have Disorder k), expressing the size measureof the association of each pair of disorders indirectly as determined bythe correlation σ_(kk), between continuous bivariate normal disorderprecursors α′_(k), α′_(k′) to dichotomous disorders α_(k), α_(k′), theneeded dichotomous disorder pairs α_(k), α_(k′) then produced by cuttingeach α′_(k) at a specified disorder cutpoint such that the Disorder kpossession probability p_(k) is defined to equal Prob(α′_(k) ≧cut point)with the level α′_(k) of the disorder judged to constitute having thedisorder (α_(k)=0) thus defined by selecting the cut point for α′_(k)with p_(k) being the proportion of the patients judged to have thedisorder as determined by the setting of the α′_(k) cutpoint, therebyeach disorder through the continuous α′_(k) and its cut point having anassigned level representing that a patient exceeding that level has thedisorder.
 23. A method comprising constructing a set of medically orpsychiatrically focused dichotomizable patient symptoms ordichotomizable personal characteristics, and selecting for evaluation aspecified set of possible medical or psychiatric disorders that eachpatient has or does not have, with multiple disorders per patient beingpossible; creating a mathematically expressed model comprising thesymptoms or characteristics and the specified disorders selected forevaluation, with a latent health or quality-of-life parameterrepresenting latent aspects of the patient not included in the specifiedset of disorders, the model including parameters describing how thesymptoms or characteristics depend on the specified set of disorders andhow the symptoms or characteristics also depend on a latent generalhealth or quality of life parameter in such a manner that the patientsymptoms or characteristics provide estimation information about eachparameter permitting calibration thereof and predictions of thelikelihood of the possible disorders, the model further accounting for aprobability that a patient may possess a set of symptoms orcharacteristics representative of a disorder and yet the patient doesnot have the disorder and a patient may lack at least one of thesymptoms or characteristics typical of the disorder and yet the patienthas the disorder; and using the model to which patient data is appliedto generate predictions of probabilities of patients having each of thedisorders in the specified set of possible disorders.
 24. A method inaccordance with claim 23 comprising: constructing a medical orpsychiatric diagnosis comprising observed symptoms/characteristics,X_(ij)=0 or 1 according as Patient j does not or does displaySymptom/characteristic i respectively, and selecting a set of possibledisorders {α_(jk)} with α_(jk)=0 or 1 according as Patient j does ordoes not have Disorder k, respectively; and creating a mathematicallyexpressed model that includes identifiable and hence capable of beingcalibrated parameters {π*, r*} describing how thesymptoms/characteristics depend on the selected set of disordersaccording to the following probability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of displayingSymptom/characteristic i as determined by the disorders Patient j hasand does not have and assuming completeness with respect to the latenthealth/quality of life variable, the product of r*'s over the mdisorders associated with Symptom/characteristic i as specified by theincidence matrix, π*_(i)=Π(π_(ik)) with the product over k,r*_(ik)=r_(ik)/π_(ik), with r_(ik)=Prob(Symptom/characteristic i giventhat the patient has Disorder k), π_(ik)=Prob (Symptom/characteristic igiven the patient does not have Disorder k).
 25. A method comprisingconstructing a set of medically or psychiatrically focuseddichotomizable patient symptoms or dichotomizable personalcharacteristics, and selecting for evaluation a specified set ofpossible medical or psychiatric disorders that each patient has or doesnot have, with multiple disorders per patient being possible; creating amathematically expressed model comprising the symptoms orcharacteristics and the specified disorders selected for evaluation,with a latent health or quality-of-life parameter representing latentaspects of the patient not included in the specified set of disorders,the model including parameters describing how the symptoms orcharacteristics depend on the specified set of disorders and how thesymptoms or characteristics also depend on a latent general health orquality of life parameter, the model further accounting for aprobability that a patient may possess a set of symptoms orcharacteristics representative of a disorder and yet the patient doesnot have the disorder and a patient may lack at least one of thesymptoms or characteristics typical of the disorder and yet the patienthas the disorder, for some psychiatric and medical disorders the modeldefining a level judged as constituting having the disorder for each ofthe some psychiatric and medical disorders, and using the model to whichpatient data is applied to generate predictions of probabilities ofpatients having each of the disorders in the specified set of possibledisorders.
 26. A method in accordance with claim 25 comprising:constructing a medical or psychiatric diagnosis comprising observedsymptoms/characteristics, X_(ij)=0 or 1 according as Patient j does notor does display Symptom/characteristic i respectively, and selecting aset of possible disorders {α_(jk)} with α_(jk)=0 or 1 according asPatient j does or does not have Disorder k, respectively; and creating amathematically expressed model that includes parameters {π*, r*}describing how the symptoms/characteristics depend on the selected setof disorders according to the following probability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of displayingSymptom/characteristic i as determined by the disorders Patient j hasand does not have and assuming completeness with respect to the latenthealth/quality of life variable, the product of r*'s over the mdisorders associated with Symptom/characteristic i as specified by theincidence matrix, π*_(i)=Π(π_(ik)) with the product over k,r*_(ik)=r_(ik)/π_(ik), with r_(ik)=Prob(Symptom/characteristic i giventhat the patient has Disorder k), π_(ik)=Prob (Symptom/characteristic igiven the patient does not have Disorder k), with p_(k) being theproportion of the patients judged to have the disorder as determined bythe user defining of the disorder.
 27. A method comprising constructinga set of medically or psychiatrically focused dichotomizable patientsymptoms or dichotomizable personal characteristics, and selecting forevaluation a specified set of possible medical or psychiatric disordersthat each patient has or does not have, with multiple disorders perpatient being possible; creating a mathematically expressed modelcomprising the symptoms or characteristics and the specified disordersselected for evaluation, with a latent health or quality-of-lifeparameter representing latent aspects of the patient not included in thespecified set of disorders, the model including parameters describinghow the symptoms or characteristics depend on the specified set ofdisorders and how the symptoms or characteristics also depend on alatent general health or quality of life, the model further accountingfor a probability that a patient may possess a set of symptoms orcharacteristics representative of a disorder and yet the patient doesnot have the disorder and a patient may lack at least one of thesymptoms or characteristics typical of the disorder and yet the patienthas the disorder and the model expressing for some pairs among all pairsof the selected set of disorders an association, either positive ornegative, of each of the same pairs and further expressing a sizemeasure of the association of each pair of the selected set of disordersthat can be estimated from the patient responses to the symptoms orcharacteristics; and using the model to which patient data is applied togenerate predictions of probabilities of patients having each of thedisorders in the specified set of possible disorders.
 28. A method inaccordance with claim 27 comprising: constructing a medical orpsychiatric diagnosis comprising observed symptoms/characteristics,X_(ij)=0 or 1 according as Patient j does not or does displaySymptom/characteristic i respectively, and selecting a set of possibledisorders {α_(jk)} with α_(jk)=0 or 1 according as Patient j does ordoes not have Disorder k, respectively; and creating a mathematicallyexpressed model that includes parameters {π*, r*} describing how thesymptoms/characteristics depend on the selected set of disordersaccording to the following probability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of displayingSymptom/characteristic i as determined by the disorders Patient j hasand does not have and assuming completeness with respect to the latenthealth/quality of life variable, the product of r*'s over the mdisorders associated with Symptom/characteristic i as specified by theincidence matrix, π*=Π(π_(ik)) with the product over k,r*_(ik)=r_(ik)/π_(ik), with r_(ik)=Prob(Symptom/characteristic i giventhat the patient has Disorder k), π_(ik)=Prob (Symptom/characteristic igiven the patient does not have Disorder k), expressing the size measureof the association of each pair of disorders indirectly as determined bythe correlation σ_(kk), between continuous bivariate normal disorderprecursors α′_(k), α′_(k′) to dichotomous disorders α_(k), α_(k′), theneeded dichotomous disorder pairs α_(k), α_(k′) then produced by cuttingeach α′_(k) at a cutpoint.
 29. A method comprising: constructing a setof dichotomizably scored probes and selecting a set of unobservabledichotomized properties possessed or not possessed by each object andthe intent to assess a latent state of each of the objects being probed;creating a mathematically expressed model comprising the probes and theselected properties, the selected properties being a subset of a largergroup of properties influencing probe response of objects with anunspecified remainder of the larger group of properties being accountedfor in the model by a residual state parameter, the model includingparameters describing how the probes depend on the selected set ofproperties and how the probes also depend on the residual stateparameter in such a manner that object responses to the probes provideestimation information about each parameter permitting calibrationthereof and predictions of which properties the objects do or do notpossess, the model further accounting for a probability that an objectfor each individual probe may possess all the properties from the subsetof the selected set of properties required for a positive response tothe individual probe but may fail to apply at least one requiredproperty appropriately to the individual probe, thereby responding tothe probe negatively, and that each object for each individual probe mayhave failed to possess at least one specified property required for apositive response to the probe and nevertheless apply appropriately therequired specified properties that are not possessed to the probe andalso apply the remaining required and possessed properties from theselected set of properties appropriately, thereby responding to theprobe positively, the model defining the level of possession of eachproperty to be an assigned level judged to confer object possession ofthe individual property, and the model expressing for pairs of theselected set of properties an association, either positive or negative,between the two members of each of the pairs and further expressing asize measure of the positive or negative association of each pair ofproperties that can be estimated for each pair from the object responsesto the individual probes; applying combined probe results obtained fromthe responses of the objects to calibrating the individual probes of themodel; and generating a prediction of possession of the property, aprediction of failure to possess the property, or a withholding of sucha prediction for each object and specified property combination.
 30. Amethod in accordance with claim 29 comprising: constructing a set ofdichotomously scored probes, X_(ij)=0 or 1 according as Object jresponds negatively or positively to Probe i respectively, and selectinga set of latent properties {α_(ik)} with α_(jk)=0 or 1 according asObject j does not possess or possesses Property k, respectively; andcreating a mathematically expressed model that includes identifiable andhence capable of being calibrated parameters {π*, r*} describing how theprobes depend on the selected set of latent properties according to thefollowing probability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of respondingpositively to Probe i as determined by the specified properties Object jpossesses and does not possess and assuming completeness with respect tothe residual state, the product of r*'s over the m properties requiredfor a positive response to Probe i as specified by the incidence matrix,π*_(i)=Π(π_(ik)) with the product over k, r*_(ik)=r_(ik)/π_(ik), withr_(ik)=Prob(positive response to Probe i given that the object does notpossess Property k), πik=Prob (positive response to Probe i given thatthe object does possess Property k), expressing the size measure of thepositive association of each pair of properties indirectly as determinedby the correlation σ_(kk), between continuous bivariate normal propertyprecursors α′_(k), α′_(k′) to dichotomous properties α_(k), α_(k′), theneeded dichotomous property pairs α_(k), α_(k′) then produced by cuttingeach α′_(k) at a specified possession level cutpoint such that theProperty k possession probability p_(k) that (α_(k)=1) is defined toequal Prob(α′_(k)≧cut point) with the level of property possession thusdefined by selecting the cut point for α′_(k) with p_(k) being the userdecided proportion of objects judged to possess Property k, thereby eachproperty through the continuous α′_(k) and its cut point having anassigned level representing that an object exceeding that levelpossesses the property.
 31. A method comprising: constructing a set ofdichotomizably scored probes and selecting a set of unobservabledichotomized properties possessed or not possessed by each object andthe intent to assess a latent state of each of the objects being probed;creating a mathematically expressed model comprising the probes and theselected properties, the selected properties being a subset of a largergroup of properties influencing probe responses of objects with anunspecified remainder of the larger group of properties being accountedfor in the model by a residual state parameter, the model includingparameters describing how the probes depend on the selected set ofproperties and how the probes also depend on the residual stateparameter in such a manner that object responses to the probes provideestimation information about each parameter permitting calibrationthereof and predictions of which properties the objects do or do notpossess, the model further accounting for a probability that an objectfor each individual probe may possess all the properties from the subsetof the selected set of properties required for a positive response tothe individual probe but may fail to apply at least one requiredproperty appropriately to the individual probe, thereby responding tothe probe negatively, and that each object for each individual probe mayhave failed to possess at least one selected property required for apositive response to the probe and nevertheless apply appropriately therequired selected properties that are not possessed to the probe andalso apply the unspecified remaining properties from the selected set ofproperties appropriately, thereby responding to the probe positively,and the model expressing for pairs of the selected set of properties anassociation, either positive or negative, between the two members ofeach of the pairs and further expressing a size measure of the positiveor negative association of each pair of properties that can be estimatedfor each pair from the object responses to the individual probes;applying combined probe results obtained from the responses of theobjects to calibrating the individual probes of the model; andgenerating a prediction of possession of the property, a prediction offailure to possess the property, or a withholding of such a predictionfor each object and specified property combination.
 32. A method inaccordance with claim 31 comprising: constructing a set of dichotomouslyscored probes, X_(ij)=0 or 1 according as Object j responds negativelyor positively to Probe i respectively, and selecting a set of latentproperties {α_(jk)} with α_(jk)=0 or 1 according as Object j does notpossess or possesses Property k, respectively; and creating amathematically expressed model that includes identifiable and hencecapable of being calibrated parameters {π*, r* } describing how theprobes depend on the selected set of latent properties according to thefollowing probability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r ₁₂*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of respondingpositively to Probe i as determined by the specified properties Object jpossesses and does not possess and assuming completeness with respect tothe residual state, the product of r*'s over the m properties requiredfor a positive response to Probe i as specified by the incidence matrix,π*_(i)=Π(π_(ik)) with the product over k, r*_(ik)=r_(ik)/π_(ik), withr_(ik)=Prob(positive response to Probe i given that the object does notpossess Property k), π_(ik)=Prob (positive response to Probe i giventhat the object does possess Property k), expressing the size measure ofthe positive association of each pair of properties indirectly asdetermined by the correlation σ_(kk), between continuous bivariatenormal property precursors α′_(k), α′_(k′) to dichotomous propertiesα_(k), α_(k′), the needed dichotomous property pairs α_(k), α_(k′) thenproduced by cutting each α′_(k) at some level.
 33. A method comprising:constructing a set of dichotomizably scored probes and selecting a setof unobservable dichotomized properties possessed or not possessed byeach object and the intent to assess a latent state of each of theobjects being probed; creating a mathematically expressed modelcomprising the probes and the selected properties, the selectedproperties being a subset of a larger group of properties influencingprobe responses of objects with an unspecified remainder of the largergroup of properties being accounted for in the model by a residual stateparameter, the model including parameters describing how the probesdepend on the selected set of properties and how the probes also dependon the residual state parameter in such a manner that object responsesto the probes provide estimation information about each parameterpermitting calibration thereof and predictions of which properties theobjects do or do not possess, the model further accounting for aprobability that an object for each individual probe may possess all theproperties from the subset of the selected set of properties requiredfor a positive response to the individual probe but may fail to apply atleast one required property appropriately to the individual probe,thereby responding to the probe negatively, and that each object foreach individual probe may have failed to possess at least one selectedproperty required for a positive response to the probe and neverthelessapply appropriately the required selected properties that are notpossessed to the probe and also apply the unspecified remainingproperties from the selected set of properties appropriately, therebyresponding to the probe positively, the model defining the level ofpossession of each property to be an assigned level judged to conferobject possession of the individual property; applying combined proberesults obtained from the responses of the objects to calibrating theindividual probes of the model; and generating a prediction ofpossession of the property, a prediction of failure to possess theproperty, or a withholding of such a prediction for each object andspecified property combination.
 34. A method in accordance with claim 33comprising: constructing a set of dichotomously scored probes, X_(ij)=0or 1 according as Object j responds negatively or positively to Probe irespectively, and selecting a set of latent properties {α_(jk)} withα_(jk)=0 or 1 according as Object j does not possess or possessesProperty k, respectively; and creating a mathematically expressed modelthat includes identifiable and hence capable of being calibratedparameters {π*, r*} describing how the probes depend on the selected setof latent properties according to the following probability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of respondingpositively to Probe i as determined by the specified properties Object jpossesses and does not possess and assuming completeness with respect tothe residual state, the product of r*'s over the m properties requiredfor a positive response to Probe i as specified by the incidence matrix,π*_(i)=Π(π_(ik)) with the product over k, r*_(ik)=r_(ik)/π_(ik), withr_(ik)=Prob(positive response to Probe i given that the object does notpossess Property k), π_(ik)=Prob (positive response to Probe i giventhat the object does possess Property k), with p_(k) being the userdecided proportion of objects judged to possess Property k.
 35. A methodcomprising: constructing a set of dichotomizably scored probes andselecting a set of unobservable dichotomized properties possessed or notpossessed by each object and designed to assess a latent state of eachof the objects being probed; creating a mathematically expressed modelcomprising the probes and the selected properties, the selectedproperties being a subset of a larger group of properties influencingprobe responses of objects with an unspecified remainder of the largergroup of properties being accounted for in the model by a residual stateparameter, the model including parameters describing how the probesdepend on the selected set of properties and how the probes also dependon the residual state parameter and predictions of which properties theobjects do or do not possess, the model further accounting for aprobability that an object for each individual probe may possess all theproperties from the subset of the selected set of properties requiredfor a positive response to the individual probe but may fail to apply atleast one required property appropriately to the individual probe,thereby responding to the probe negatively, and that each object foreach individual probe may have failed to possess at least one selectedproperty required for a positive response to the probe and neverthelessapply appropriately the required selected properties that are notpossessed to the probe and also apply the unspecified remainingproperties from the selected set of properties appropriately, therebyresponding to the probe positively, the model defining the level ofpossession of each property to be an assigned level judged to conferobject possession of the individual property, and the model expressingfor pairs of the selected set of properties an association, eitherpositive or negative, between the two members of each of the pairs andfurther expressing a size measure of the positive or negativeassociation of each pair of properties that can be estimated for eachpair from the object responses to the individual probes; applyingcombined probe results obtained from the responses of the objects tocalibrating the individual probes of the model; and generating aprediction of possession of the property, a prediction of failure topossess the property, or a withholding of such a prediction for eachobject and specified property combination.
 36. A method in accordancewith claim 35 comprising: constructing a set of dichotomously scoredprobes, X_(ij)=0 or 1 according as Object j responds negatively orpositively to Probe i respectively, and selecting a set of latentproperties {α_(jk)} with α_(jk)=0 or 1 according as Object j does notpossess or possesses Property k, respectively; and creating amathematically expressed model that includes parameters {π*, r*}describing how the probes depend on the selected set of latentproperties according to the following probability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of respondingpositively to Probe i as determined by the specified properties Object jpossesses and does not possess and assuming completeness with respect tothe residual state, the product of r*'s over the m properties requiredfor a positive response to Probe i as specified by the incidence matrix,π*_(i)=Π(π_(ik)) with the product over k, r*_(ik)=r_(ik)/π_(ik), withr_(ik)=Prob(positive response to Probe i given that the object does notpossess Property k), π_(ik)=Prob (positive response to Probe i giventhat the object does possess Property k), expressing the size measure ofthe positive association of each pair of properties indirectly asdetermined by the correlation σ_(kk), between continuous bivariatenormal property precursors α′_(k), α′_(k′) to dichotomous propertiesα_(k), α_(k′,) the needed dichotomous property pairs α_(k), α_(k′) thenproduced by cutting each α′_(k) at a specified possession level cutpointsuch that the Property k possession probability p_(k) that (α_(k)=1) isdefined to equal Prob(α′_(k)≧cut point) with the level of propertypossession thus defined by selecting the cut point for α′_(k) with p_(k)being the user decided proportion of objects judged to possess Propertyk, thereby each property through the continuous α′_(k) and its cut pointhaving an assigned level representing that an object exceeding thatlevel possesses the property.
 37. A method comprising: constructing aset of dichotomizably scored probes and selecting a set of unobservabledichotomized properties possessed or not possessed by each object andthe intent to assess a latent state of each of the objects being probed;creating a mathematically expressed model comprising the probes and theselected properties, the selected properties being a subset of a largergroup of properties influencing probe responses of objects with anunspecified remainder of the larger group of properties being accountedfor in the model by a residual state parameter, the model includingparameters describing how the probes depend on the selected set ofproperties and how the probes also depend on the residual stateparameter in such a manner that object responses to the probes provideestimation information about each parameter permitting calibrationthereof and predictions of which properties the objects do or do notpossess, the model further accounting for a probability that an objectfor each individual probe may possess all the properties from the subsetof the selected set of properties required for a positive response tothe individual probe but may fail to apply at least one requiredproperty appropriately to the individual probe, thereby responding tothe probe negatively, and that each object for each individual probe mayhave failed to possess at least one selected property required for apositive response to the probe and nevertheless apply appropriately therequired selected properties that are not possessed to the probe andalso apply the unspecified remaining properties from the selected set ofproperties appropriately, thereby responding to the probe positively;applying combined probe results obtained from the responses of theobjects to calibrating the individual probes of the model; andgenerating a prediction of possession of the property, a prediction offailure to possess the property, or a withholding of such a predictionfor each object and specified property combination.
 38. A method inaccordance with claim 37 comprising: constructing a set of dichotomouslyscored probes, X_(ij)=0 or 1 according as Object j responds negativelyor positively to Probe i respectively, and selecting a set of latentproperties {α_(jk)} with α_(jk)=0 or 1 according as Object j does notpossess or possesses Property k, respectively; and creating amathematically expressed model that includes identifiable and hencecapable of being calibrated parameters {π*, r*} describing how theprobes depend on the selected set of latent properties according to thefollowing probability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of respondingpositively to Probe i as determined by the specified properties Object jpossesses and does not possess and assuming completeness with respect tothe residual state, the product of r*'s over the m properties requiredfor a positive response to Probe i as specified by the incidence matrix,π*_(i)=Π(π_(ik)) with the product over k, r*_(ik)=r_(ik), withr_(ik)=Prob(positive response to Probe i given that the object does notpossess Property k), π_(ik)=Prob (positive response to Probe i giventhat the object does possess Property k).
 39. A method comprising:constructing a set of dichotomizably scored probes and selecting a setof unobservable dichotomized properties possessed or not possessed byeach object and the intent to assess a latent state of each of theobjects being probed; creating a mathematically expressed modelcomprising the probes and the selected properties, the selectedproperties being a subset of a larger group of properties influencingprobe responses of objects with an unspecified remainder of the largergroup of properties being accounted for in the model by a residual stateparameter, the model including parameters describing how the probesdepend on the selected set of properties and how the probes also dependon the residual state parameter and predictions of which properties theobjects do or do not possess, the model further accounting for aprobability that an object for each individual probe may possess all theproperties from the subset of the selected set of properties requiredfor a positive response to the individual probe but may fail to apply atleast one required property appropriately to the individual probe,thereby responding to the probe negatively, and that each object foreach individual probe may have failed to possess at least one selectedproperty required for a positive response to the probe and neverthelessapply appropriately the required selected properties that are notpossessed to the probe and also apply the unspecified remainingproperties from the selected set of properties appropriately, therebyresponding to the probe positively, the model defining the level ofpossession of each property to be an assigned level judged to conferobject possession of the individual property; applying combined proberesults obtained from the responses of the objects to calibrating theindividual probes of the model; and generating a prediction ofpossession of the property, a prediction of failure to possess theproperty, or a withholding of such a prediction for each object andspecified property combination.
 40. A method in accordance with claim 39comprising: constructing a set of dichotomously scored probes, X_(ij)=0or 1 according as Object j responds negatively or positively to Probe irespectively, and selecting a set of latent properties {α_(jk)} withα_(jk)=0 or 1 according as Object j does not possess or possessesProperty k, respectively; and creating a mathematically expressed modelthat includes parameters {π*, r*} describing how the probes depend onthe selected set of latent properties according to the followingprobability:S _(ij)=(π_(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of respondingpositively to Probe i as determined by the specified properties Object jpossesses and does not possess and assuming completeness with respect tothe residual state, the product of r*'s over the m properties requiredfor a positive response to Probe i as specified by the incidence matrix,π*_(i)=Π(π_(ik)) with the product over k, r*_(ik)=r_(ik)/π_(ik), withr_(ik)=Prob(positive response to Probe i given that the object does notpossess Property k), π_(ik)=Prob (positive response to Probe i giventhat the object does possess Property k), with p_(k) being the userdecided proportion of objects judged to possess Property k.
 41. A methodcomprising: constructing a set of dichotomizably scored probes andselecting a set of unobservable dichotomized properties possessed or notpossessed by each object and the intent to assess a latent state of eachof the objects being probed; creating a mathematically expressed modelcomprising the probes and the selected properties, the selectedproperties being a subset of a larger group of properties influencingprobe responses of objects with an unspecified remainder of the largergroup of properties being accounted for in the model by a residual stateparameter, the model including parameters describing how the probesdepend on the selected set of properties and how the probes also dependon the residual state parameter and predictions of which properties theobjects do or do not possess, the model further accounting for aprobability that an object for each individual probe may possess all theproperties from the subset of the selected set of properties requiredfor a positive response to the individual probe but may fail to apply atleast one required property appropriately to the individual probe,thereby responding to the probe negatively, and that each object foreach individual probe may have failed to possess at least one selectedproperty required for a positive response to the probe and neverthelessapply appropriately the required selected properties that are notpossessed to the probe and also apply the unspecified remainingproperties from the selected set of properties appropriately, therebyresponding to the probe positively, and the model expressing for pairsof the selected set of properties an association, either positive ornegative, between the two members of each of the pairs and furtherexpressing a size measure of the positive or negative association ofeach pair of properties that can be estimated for each pair from theobject responses to the individual probes; applying combined proberesults obtained from the responses of the objects to calibrating theindividual probes of the model; and generating a prediction ofpossession of the property, a prediction of failure to possess theproperty, or a withholding of such a prediction for each object andspecified property combination.
 42. A method in accordance with claim 41comprising: constructing a set of dichotomously scored probes, X_(ij)=0or 1 according as Object j responds negatively or positively to Probe irespectively, and selecting a set of latent properties {α_(jk)} withα_(jk)=0 or 1 according as Object j does not possess or possessesProperty k, respectively; and creating a mathematically expressed modelthat includes parameters {π*, r*} describing how the probes depend onthe selected set of latent properties according to the followingprobability:S _(ij) =(π _(i)*)×(r _(i1)*)^(1−aj1)×(r _(i2)*)^(1−aj2)× . . . ×(r_(im)*)^(1−ajm) with S_(ij) being the probability of respondingpositively to Probe i as determined by the specified properties Object jpossesses and does not possess and assuming completeness with respect tothe residual state, the product of r*'s over the m properties requiredfor a positive response to Probe i as specified by the incidence matrix,π*_(i)=Π(π_(ik)) with the product over k, r*_(ik)=r_(ik)/π_(ik), withr_(ik)=Prob(positive response to Probe i given that the object does notpossess Property k), ‘π_(ik)=Prob (positive response to Probe i giventhat the object does possess Property k), expressing the size measure ofthe positive association of each pair of properties indirectly asdetermined by the correlation σ_(kk), between continuous bivariatenormal property precursors α′_(k), α′_(k′) to dichotomous propertiesα_(k), α_(k′,) the needed dichotomous property pairs α_(k), α_(k′) thenproduced by cutting each α′_(k) at some possession level cutpoint.